Statistics for sciences: Timeline

statistics for sciences timeline

This resource provides guidance on the fundamental concepts and elementary ideas that underpin statistics and data analysis processes used in many university sciences.

Use the 'hot' edges of the timeline to navigate along the topic panels.

Most topic panels connect to more detailed content, and many include links to external resources such as video tutorials and exercises to try.

Main text content is adapted from StatSoft

[(Electronic Version): StatSoft, Inc. (2013). Electronic Statistics Textbook. Tulsa, OK: StatSoft. Available at: http://www.statsoft.com/Textbook.]

what are variables?
Variables are things that we measure, control, or manipulate in research
They differ in many respects, most notably in the role they are given and in the type of measures that can be applied to them.
By measuring or counting several, usually lots of elements of our variables we generate data.

samples and populations
Usually we explore the characteristics of a set of data elements that together represent a sample of the broader background population from which they are drawn.
We must be aware that this approach will always lead us to conclude that the output of any analysis may suggest a research result but never prove one.

correlational vs experimental research
Most empirical research belongs to one of these two general categories. In correlational research, we do not (or at least try not to) influence any variables but only measure them and look for relations (correlations) between variables, for example: blood pressure and cholesterol level. In experimental research, we manipulate some variables and then measure the effects of this manipulation on other variables.

For example...

dependent vs independent variables
Independent variables are those that are manipulated whereas dependent variables are only measured or registered.
This distinction appears terminologically confusing to many because, as some students say, "all variables depend on something." However, once you get used to this distinction, it becomes indispensable. The terms dependent and independent

variable apply mostly to experimental research ...

measurement scales
Variables differ in how well they can be measured, i.e., in how much measurable information their measurement scale can provide. The most important factor that determines the information that can be provided by a variable is its type of measurement scale. Specifically, variables are

classified as (a) nominal, (b) ordinal, (c) interval, or (d) ratio. ...

relations between variables
Regardless of their type, two or more variables are related if, in a sample of observations, the values of those variables are distributed in a consistent manner. In other words, variables are related if their values systematically correspond to each other for these observations.

For example ...

why relations between variables are important
Generally speaking, the ultimate goal of every research or scientific analysis is to find relations between variables. The philosophy of science teaches us that there is no other way of representing "meaning" except in terms of relations between some quantities or qualities; either way involves relations between variables. Thus, the advancement of science must always involve finding new relations between variables...

magnitude and reliability
The two most elementary properties of every relation between variables are concerned with the size or magnitude of the variables and its 'truthfulness' or reliability.
To find out about magnitude we count or measure incidences of our variable, and for

reliability we apply the idea of 'repeatability'...

what is statistical significance
The statistical significance of a result is the probability that the observed relationship between variables, or a difference, for example, between means, in samples has occurred by chance, and that in the population from which the sample was drawn, no such relationship or differences exist.
Using less technical terms, we could say that thestatistical significance of a result tells us something about the degree to which the

result is "true", that is, "representative" of the population ...

is the result really significant?
The level of significance that will be treated as really "significant” is much of an arbitrary decision. That is, the selection of some level of significance, up to which the results will be rejected as invalid, is down to the researcher to decide. In practice, the final decision usually depends on whether theoutcome waspredicted, ...

significance vs analyses
The more analyses you perform on a data set, the more results will meet the conventional significance level 'by chance'. For example, if you calculate correlations between ten variables (i.e., 45 different correlation coefficients), then you should expect to find by chance that about two (i.e., one in every 20) correlation coefficients are significant at the p .05 level, even if the values of the variables weretotally randomand thosevariables do not correlate in the population...

strength vs reliability
Although strength and reliability are two different features of the relationship between variables they clearly impact on one another.
Generally, in a sample of a particular size, the larger the magnitude of the relation between the variables, the more reliable the relation. Remember, 'reliability' is all about 'repeatability' and we might think of strength in terms of the magnitude of the relation.

strength and significance
Assuming that there is NO relation between variables in a population we would also expect to find no relation between these variables in the research sample. Conversely, the stronger the relation found in the sample, the less likely it is that there will be no corresponding relation in the population...

significance and sample size
This idea is important:

If a data set is small, that is, it contains only a few incidences of the variables being assessed, then there are also relatively few possible combinations of the values of the variables and hence, the probability of getting by chance any particular combination of those values indicative of a strong relation is relatively high...

objective relations vs sample size
If a relation between variables of interest is 'objectively' small, - that is, the background population that they are drawn from is small - then unless the research sample is correspondingly large there is no way to identify such a relation. Here are a couple of examples...

HYPOTHESES and research questions
The key principle in almost all research projects is to collect some data, examine, measure or count it, and then report what that tells you, Usually this means starting with a research question - or often in the social sciences, collect data first and then see if something interesting emerges (known as 'grounded theory research') - that enablea research question to be developed ...

measuring the strength of a relationship
There are very many measures of the magnitude of relationships between variables that have been developed by statisticians; the choice of a specific measure in given circumstances depends on the number of variables involved, measurement scales used, nature of the relations, etc. Almost all of them, however, follow one general principle: they attempt to somehow evaluate the observed relation by comparing it to the "maximum imaginable

relation" between those specific variables...

statistical tests: what are they?
As the 'question' that your research is suggesting develops into the 'statement' that is your Hypothesis you will use the statistics that you can compute, based on the data you are analysing, to test the strength of the relation between our variables. We are testing the strength of the veracity (truthfulness) of the

statement that is our Hypothesis ...

STATISTICAL TESTS - more detail
The statistical process we use to test our research hypothesis is determined by many factors. For example, to look for a significant difference between sample means, the t-test may ideal whereas in searching for relationships or linkages between several dependent variables, multi-factorial ANOVA may beappropriate.

So how do we decide which test to apply? ...

format of statistical tests
Because the ultimate goal of most statistical tests is to evaluate relations between variables - which effectively is 'testing an hypothesis' - most statistical tests a general format, explained in the earlier section about measuring relationship strength. To be strict, tests represent a ratio of some measure of the differentiation common in

the variables of interest to the overall differentiation of those variables...

how 'level of significance' is calculated
Let's assume that we have already calculated a measure of a relation between two variables (as explained in earlier examples). The next question is "how significant is this relation?" For example, is 40% of the explained variance between the two variables enough to consider the relation significant? The answer is "it depends." Specifically, the significance depends mostly on the sample size. As explained before, in very large samples, even very

small relations between variables will be significant ...

the NORMAL DISTRIBUTION
Why is it so important?
The distribution of many test statistics is normal or follows some form that can be derived from the normal distribution. Philosophically speaking, the normal distribution represents one of the empirically verified elementary "truths about the general nature of reality," and its status can

be compared to the one of fundamental laws of natural sciences...

the Normal Distribution in statistical reasoning
Recall the earlier where pairs of samples of males and females were drawn from a population in which the average value of White Cell Count in males and females was exactly the same. Although the most likely outcome of such experiments (one pair of samples per experiment) was that the difference between the average WCC in males and females in each pair is close to zero, from time to time, a pair of samples will be drawn where the

difference between males and females is quite different from 0. How often does it happen? ...

are test statistics always normally distributed?
Not all, but most of them are either based on the normal distribution directly or on distributions that are related to and can be derived from normal, such as t, F, or Chi-square. Typically, these tests require that the variables analyzed are themselves normally distributed in the population, that is, they meet the so-called "normality assumption." Many observed variables actually are normally distributed, which is another reason why the normal

distribution represents a "general feature" of empirical reality.

the CENTRAL LIMIT THEOREM
This is a fundamental idea which justifyies the use of the Normal Distribution and statistical tests that are based on it in many research projects.
One way of understanding the Central Limit Theorem is in grasping its core statement that 'the distribution of sample means is always

Normal'. But what does this really mean? ...

Violating the normality assumption
Although many of the statements made in the preceding panels can be proven mathematically, some of them do not have theoretical proof and can be demonstrated only empirically, via so-called Monte-Carlo experiments. In these experiments, large numbers of samples are generated by a computer following predesigned specifications, and the results from such samples are analyzed using a variety of tests. This way we can empirically evaluate the type and magnitude

of errors or biases ...

Use this resources as a starting point for developing your statistical knowledge and competencies to support your research project.

Visit the StatSoft webpages for comprehensive guides on more advanced statistical processes, such as the use of ANOVA, multi-factorial analysis, regression processes and many other topics.

Follow this link to a statistics timeline for guidance on putting your understanding of the fundamental concepts into practice using 'basic' statistical methods