Analyse Data

General Principle Chapter 5 & 6
2 Variables Chapter 7
3 or more variables Chapter 11 & 12
General Linear Models Chapter 12 & 13

Once you have data in a suitable form, then you are ready to analyse it. In this page, we explain what test to do.

General Principle
The basic purpose of a statistical test is to deal with the uncertainty due to a sample in a specific (and slightly weird) way: a null hypothesis test. We suppose that these two statements are true:
(i) the null hypothesis is true
(ii) the data is correct
and then we ask how probable is it that the null hypothesis (ie. a population with a zero effect-size) would produce the data we have or data that is more extreme.
If that probability (p) is less than some pre-specified criterion alpha (usually 0.05), then we conclude that one of our two statements must be rejected. Since the data is a fact, then the null hypothesis must be rejected.

There are therefore 2 possible outcomes from a null hypothesis test:
(i) we reject the null hypothesis
(ii) we fail to reject the null hypothesis.
You should use one of these statements, exactly as written in italics, to state what your result is.

The null hypothesis test does not give any of these outcomes, ever:
(i) we have proved our hypothesis
(ii) we have proved the null hypothesis

For those who are interested: here is a simple critique of null hypothesis testing: [not ready yet].

Two variables: IV –> DV
The easiest case is where there is one IV and one DV. The test you use depends purely on the variable types. There is no mystery.

IV Type
DV TypeInterval or OrdinalCategorical
IntervalPearson Correlation2 cats: t-test
3+ cats: 1-way ANOVA
OrdinalSpearman Correlation 2 cats: Mann Whitney U-test
3+ cats: Kruskall-Wallace
CategoricalLogistic RegressionChi-square test of independence

For any of these tests, the result is given as 3 statements:
(i) the effect size (plus confidence interval if possible)
(ii) the test outcome, which is one of:
t(df)=tval, p=pval
F(df1,df2)=fval, p=pval
r(df)=rval, p=pval
etc
(iii) reject/fail to reject

Three or more variables: IV1 | IV2 –> DV
In this case there is a degree of history to reckon with. The table beneath shows the modern analysis in bold typeface and the old-fashioned versions in italic typeface.

DV Type
CategoricalGeneralised Linear Model
or Multinomial Logistic Regression
OrdinalGeneralised Linear Model
or Ordinal Logistic Regression
IntervalGeneral Linear Model
or
IVs all Interval: Linear Regression
IVs all Categorical: 2-way ANOVA
IVs mixture: ANCOVA

In the modern analyses, Categorical IVs are sometimes called fixed factors and Interval IVs may be called covariates.

Linear Models – General & Generalized
This topic has a page of its own here [not ready yet], but here is one more important detail.

A linear model is a formula to describe the DV as a linear combination of the IVs (ie the sum of the IVs). An ExamGrade might be the combination of 4 parts of Diligence, 3 parts of RiskTaking and 8 parts of Cleverness:
ExamGrade = 4 × Diligence + 3 × RiskTaking + 8 × Cleverness
(plus other stuff we didn’t measure)

The numbers on this formula, [4 3 and 8] are called coefficients and one outcome of an analysis is the probability that each coefficient in turn would come from the null hypothesis.

There is another analysis that is often more useful. It is called an ANOVA (analysis of variance) and it calculates how much of the variance in the DV is unique related to the variance in each variable. The analysis also does a null hypothesis test for this.