To be clear, teaching statistics is about leadership: leading students through the material. Initially you take some of the responsibility for their learning – pointing them in the right direction from the right starting point. But the most important thing is to hand that responsibility over to them when they are ready.
The students’ trajectory and yours are mirror images: they will grow in confidence and accomplishment and, critically, need you less and less. As they move forwards, you must pull back. This is hard: you start with a high investment in their success and as that success comes through you have to reduce your investment. It happens every year, and I can almost tell you which week you will first notice that they don’t need you so much. Not being needed is difficult for you to manage, but it is the surest sign that you have led well.
Learners
There are two things to be said about learners of statistics.
- The main obstacle to overcome for most students is their anxiety/fear.
- This arises because they believe that they are (i) not equipped for (ii) the type of material involved.
- It gets reinforced when they hit anything that sounds alien or confusing.
So we avoid anything that sounds alien or confusing.
- This obstacle is not helped by a lack of motivation.
- All too often it is said of statistics that they focus on the average person and no-one is interested in the average person.
- It is correct that no-one is interested, but it is wrong to say that is the focus of statistics.
So we talk about statistics as the language and exploration of human variability.
Fallacies in University Learning and Teaching
Maybe something else needs to be said. There are two enormous fallacies in university teaching. Because of them, this needs saying:
Please, when you work with students don’t judge them.
Please, when you work with students don’t judge yourself either.
Fallacy 1 concerns student attendance. We are all embedded in a system that has chosen to place a high value on student attendance. Attendance is seen as good and therefore is expected and occasionally required. Non-attendance is seen as bad. This is a simple mistake. Engagement is what counts, and although attendance often equates to engagement, non-attendance does not necessarily equate to non-engagement.
Don’t judge students who don’t attend. If they don’t attend, then one thing we can say for sure is that there will be a reason. Since we rarely know what that reason is, we can’t say whether we think it a good reason. Even if the reason is that they have a chaotic social life, they have a reason and we don’t know why their social life is chaotic.
And don’t judge yourself either if they don’t attend your class. I can promise you one thing. When that happens the reason is usually that they have overcome their anxiety about the topic and feel they can go it alone. What better learning outcome is there than that? That is a sign that you have succeeded in your job: something you should pat yourself on the back for.
Fallacy 2 is that good teaching matters. An inspiring teacher can change everything for a student, but they are extremely unlikely to do that by teaching – by giving tuition. Quite possibly, the best thing a teacher can say is “I don’t know but let’s see if we can find out.”
So, don’t judge yourselves when you get asked a question you cannot answer. You are the expert in being a student not the expert in statistics.
It’s a question of who is in control. If the teacher is in control, then I think they will be ineffective – they are doing the learning process for the student. If the student is in control, then I think support from a teacher will go much better.
Incidentally, this is why I think lectures are not great. They were necessary in times past when that was the only way of passing material on. But now we have much better ways – and mainly better because they put the student in control. Even something as simple as the control a student has when watching or listening to a podcast or vlog. They can pause, rewind or skip ahead. In a lecture, they have to go at the speed of the lecturer. A lecture demands almost complete submission by students – and a lecture theatre is structured to make that happen.
Fwiw, as a student (in the middle of last century) I rarely went to lectures and spent most of my time in the library reading. The exceptions, now I think of it, were all where the lecturer made no attempt to teach but instead told the students why they were excited or intrigued by the topic. I have no recollection of what Oliver Zangwill said about hypnosis, but I can vividly remember his excitement.
Basic Statistics
The sequence that we teach the topic is designed to show students both the principles of statistics and the frailties of those principles. In the next few paragraphs, I will deal with the core story up to the point where we have got p-values. Later, we will return to two more important topics: research design and the use of multiple variables.
We start with variables. A variable is just a way that people or situations vary. It is a hard concept to get until you have it, at which point it suddenly becomes second nature. There is a load of guff that follows – means, medians etc etc. None of that matters and we only teach it because not doing so would be one step too radical. If this interests you – then watch out as the course proceeds how often we talk about medians. Probably never.
There are two things about variables that do matter – and matter a lot.
- Variance does matter, because it is directly related to how different people are. If something doesn’t vary, then it isn’t interesting.
- The second is to observe that any variable is an entirely arbitrary thing. For example, nobody knows what we are measuring with an IQ test – and it is probably lots of different things including how comfortable someone is with doing tests. Psychology treats variables as real – they aren’t.
Then we go on to relationships between variables. This is the heart of Psychology (for now anyway). Everything we know about Psychology comes down to a relationship between variables. The variables might not be real, but the relationships are. The key idea here is that these relationships vary in strength (aka effect size). A strong relationship says that two things are closely connected.
Diagrams are nearly always a good way of looking at and talking about relationships.
BrawStats uses simple block diagrams to show relationships:
This type of diagram is central to theories and hypotheses in Psychology.
We collect a sample of data about the variables we are interested in. If that data shows a relationship, then we can begin to suspect that the relationship might be real. The relationship in the data is best seen with graphs like this:
There’s lots of other things going on in the data which have nothing to do with our two variables. The solid line is the relationship on its own without all that extra stuff. The shaded area is our best guess as to what the real relationship in the population might be.
Next we look at uncertainty: how a sample (from a population) only gives us partial information. This is the last really big concept. In BrawStats we can easily demonstrate this simply by making multiple samples and seeing how different they are from each other. Here are 10 samples from BrawStats:
There’s some technical stuff now that is very useful, but it just is technical. The headline is that the sample itself tells us how much uncertainty there is. And that is fairly magical.
Finally we reach statistical inferences. This is the end point of the basic journey. For reasons that maybe tell us a lot about psychology researchers, the idea of an uncertain result is not appealing. So, instead we hide that uncertainty behind an apparently secure inference: “the result is statistically significant” or “not”. All done courtesy of the null hypothesis testing procedure.
There are some important rules about how this is done. Logically, there are two hypotheses – the null hypothesis and the alternative hypothesis. For each, in theory we ought to be able to accept it or reject it – making 4 possible inferences. But actually, we can only reject the null hypothesis and nothing else. If we can’t reject the null hypothesis, then we can’t reach any safe inference. And even more, if we can reject the null hypothesis, that does not mean we can accept the alternative hypothesis. But hey-ho, most psychology researchers do think those two last things are the same.
So, strictly speaking, we can reach one inference about the null hypothesis (the one we aren’t interested in) but for the other three possible inferences, we can only ever say “Don’t know”. I calculate that in Psychology, the “Don’t know” outcome happens 85% of the time. I hope you will agree with me that this is all too often a very poor reward for a lot of work.
As if that isn’t bad enough, we aren’t even answering the question we wish we were. Given a set of data, the natural question is “How safe is it to suppose that this data did not come from the null hypothesis?”. The question we have answered is not that but is instead “How safe is it to expect that the null hypothesis would produce data ike this?”.
The Central Difficulty
Since you’ve done all of this, we can do something a little different amongst ourselves – that will give you an overview of the technicalities that will help. Here we go with two major words:
- Probability – how much we can expect something will happen in the future from where we are now. Probability refers to a future that hasn’t happened (or at least where we don’t know what has happened).
- Likelihood – how much we can suppose that where we are now was because something has happened in the past.
This pair of graphs will help understand this.
The basic structure shows population effect on the front-back axis and sample effect on the left-right axis. Note that population effects are always hypothetical (we never really know what they are) and sample effect sizes are always real.
The left graph shows the probability of different sample effect sizes that will be obtained from a specific population effect size. The right graph shows the likelihood of different population effect sizes as the source of a specific sample effect t size.
In psychology, we always have something that has happened – a sample. The whole point of statistics is to tell us what we can suppose about the population it came from (note past tense). The trouble is that we can’t easily do that. And we can easily say something that sounds confusingly similar but is actually completely different.
What we can easily do is to pretend that we know where we are now: in a world where the null hypothesis is correct ie there is no effect. If we pretend that, then we can work out the probability that we will get a sample with an effect at least as strong as ours. We then, keeping our fingers crossed, say that when that probability is small then the world we are pretending to be in can’t be real. In psychology, doing that means that approx 25% of all published significant effects are in fact wrong. And that’s pretty worrying.
If you want a bit more, then we can very easily go a small step forward with potentially big consequences. This step leads us into an area called Bayesian statistics. It sounds scary – and it is often described in a way that is, frankly, scary. But actually it’s really easy and not much extra from what we have done so far.
The Bayes Approach
We still do a pretend. Here’s an example. Let’s pretend that where we are now is a world where our hypothesis could have an effect size of zero (the null hypothesis) or it could have an effect size r=0.3. And before we choose our hypothesis, the probability that it will be r=0.3 is 20%.
We choose our hypothesis (not knowing its effect size) and take our sample and get its sample effect size. Now we can say how often we would get our sample from each of the two hypotheses in our pretend world. If we were to get it much more frequently from one than the other, that would feel like good evidence for that hypothesis.
This feeling is not a probability: it is really a likelihood. We are saying, if our pretend world is good, then it is more likely that our sample came from one (or the other) hypothesis – note the past tense again. If a r=0.3 hypothesis, in our pretend world will produce our sample four times as frequently as a r=0 hypothesis, then we would say that a r=0.3 hypothesis is 4 times more likely than a r=0 as the source of our sample.
That, dare one say it, is an enormous advance on the “failed to reject the null hypothesis”. The big step is that we have just made “Don’t know” answers much less frequent.
But that likelihood depends on all of the details of the world that we have pretended. Change any of the details of our pretend world, and the result would change also.
The world we have pretended is technically called the prior. And the process is that the prior is what we think we know to start with and then we use the sample to refine the prior so that what we know afterwards is changed. This refined outcome is called the posterior.
Of course, if the prior we are using is pure pretence then there is no value in saying that one hypothesis is more likely than the other. We can easily pretend something else and get a different likelihood.
So, we have to have something better than pretence. This is why Bayes approach is uncommon. It is very hard to know what to use as a prior. We need a prior of some sort – you can’t do the process without a prior. There is literally no invisible prior. Often people use what they call a non-informative prior. This sounds perfect. But it is a bit of an illusion.
Non-informative is usually taken to mean “all possible outcomes are equally likely”. In the example we are using, the non-informative prior would be to assume that each effect size (r=0 and r=0.3) is equally likely. If the (unknown) reality is that r=0 is 4 times as common as r=0.3, then the non-informative prior has the potential to lead to inferences that are very misleading.