General Principle | Chapter 5 & 6 |
2 Variables | Chapter 7 |
3 or more variables | Chapter 11 & 12 |
General Linear Models | Chapter 12 & 13 |
Once you have data in a suitable form, then you are ready to analyse it. In this page, we explain what test to do.
General Principle
The basic purpose of a statistical test is to deal with the uncertainty due to a sample in a specific (and slightly weird) way: a null hypothesis test. We suppose that these two statements are true:
(i) the null hypothesis is true
(ii) the data is correct
and then we ask how probable is it that the null hypothesis (ie. a population with a zero effect-size) would produce the data we have or data that is more extreme.
If that probability (p) is less than some pre-specified criterion alpha (usually 0.05), then we conclude that one of our two statements must be rejected. Since the data is a fact, then the null hypothesis must be rejected.
There are therefore 2 possible outcomes from a null hypothesis test:
(i) we reject the null hypothesis
(ii) we fail to reject the null hypothesis.
You should use one of these statements, exactly as written in italics, to state what your result is.
The null hypothesis test does not give any of these outcomes, ever:
(i) we have proved our hypothesis
(ii) we have proved the null hypothesis
For those who are interested: here is a simple critique of null hypothesis testing: [not ready yet].
Two variables: IV –> DV
The easiest case is where there is one IV and one DV. The test you use depends purely on the variable types. There is no mystery.
IV Type | ||
DV Type | Interval or Ordinal | Categorical |
Interval | Pearson Correlation | 2 cats: t-test 3+ cats: 1-way ANOVA |
Ordinal | Spearman Correlation | 2 cats: Mann Whitney U-test 3+ cats: Kruskall-Wallace |
Categorical | Logistic Regression | Chi-square test of independence |
For any of these tests, the result is given as 3 statements:
(i) the effect size (plus confidence interval if possible)
(ii) the test outcome, which is one of:
t(df)=tval, p=pval
F(df1,df2)=fval, p=pval
r(df)=rval, p=pval
etc
(iii) reject/fail to reject
Three or more variables: IV1 | IV2 –> DV
In this case there is a degree of history to reckon with. The table beneath shows the modern analysis in bold typeface and the old-fashioned versions in italic typeface.
DV Type | |
Categorical | Generalised Linear Model or Multinomial Logistic Regression |
Ordinal | Generalised Linear Model or Ordinal Logistic Regression |
Interval | General Linear Model or IVs all Interval: Linear Regression IVs all Categorical: 2-way ANOVA IVs mixture: ANCOVA |
In the modern analyses, Categorical IVs are sometimes called fixed factors and Interval IVs may be called covariates.
Linear Models – General & Generalized
This topic has a page of its own here [not ready yet], but here is one more important detail.
A linear model is a formula to describe the DV as a linear combination of the IVs (ie the sum of the IVs). An ExamGrade might be the combination of 4 parts of Diligence, 3 parts of RiskTaking and 8 parts of Cleverness:
ExamGrade = 4 × Diligence + 3 × RiskTaking + 8 × Cleverness
(plus other stuff we didn’t measure)
The numbers on this formula, [4 3 and 8] are called coefficients and one outcome of an analysis is the probability that each coefficient in turn would come from the null hypothesis.
There is another analysis that is often more useful. It is called an ANOVA (analysis of variance) and it calculates how much of the variance in the DV is unique related to the variance in each variable. The analysis also does a null hypothesis test for this.