statistics for sciences timeline

starting point

This resource provides guidance on the fundamental concepts and elementary ideas that underpin statistics and data analysis processes used in many university sciences.

Use the 'hot' edges of the timeline to navigate along the topic panels.

Most topic panels connect to more detailed content, and many include links to external resources such as video tutorials and exercises to try.

Main text content is adapted from**StatSoft**

[(Electronic Version): StatSoft, Inc. (2013). Electronic Statistics Textbook. Tulsa, OK: StatSoft. Available at: http://www.statsoft.com/Textbook.]

This resource provides guidance on the fundamental concepts and elementary ideas that underpin statistics and data analysis processes used in many university sciences.

Use the 'hot' edges of the timeline to navigate along the topic panels.

Most topic panels connect to more detailed content, and many include links to external resources such as video tutorials and exercises to try.

Main text content is adapted from

[(Electronic Version): StatSoft, Inc. (2013). Electronic Statistics Textbook. Tulsa, OK: StatSoft. Available at: http://www.statsoft.com/Textbook.]

Variables are things that we measure, control, or manipulate in research

They differ in many respects, most notably in the role they are given and in the type of measures that can be applied to them.

By measuring or counting several, usually lots of elements of our variables we generate

samples and populations

Usually we explore the characteristics of a set of data elements that together represent a**sample** of the broader **background population** from which they are drawn.

We must be aware that this approach will**always** lead us to conclude that the output of any analysis may **suggest** a research result but never **prove** one.

Usually we explore the characteristics of a set of data elements that together represent a

We must be aware that this approach will

correlational vs experimental research

Most empirical research belongs to one of these two general categories. In correlational research, we do not (or at least try not to) influence any variables but only measure them and look for relations (correlations) between variables, for example: blood pressure and cholesterol level. In experimental research, we manipulate some variables and then measure the effects of this manipulation on other variables. For example...

Most empirical research belongs to one of these two general categories. In correlational research, we do not (or at least try not to) influence any variables but only measure them and look for relations (correlations) between variables, for example: blood pressure and cholesterol level. In experimental research, we manipulate some variables and then measure the effects of this manipulation on other variables. For example...

dependent vs independent variables

Independent variables are those that are manipulated whereas dependent variables are only measured or registered.

This distinction appears terminologically confusing to many because, as some students say, "all variables depend on something." However, once you get used to this distinction, it becomes indispensable. The terms dependent and independent variable apply mostly to experimental research ...

Independent variables are those that are manipulated whereas dependent variables are only measured or registered.

This distinction appears terminologically confusing to many because, as some students say, "all variables depend on something." However, once you get used to this distinction, it becomes indispensable. The terms dependent and independent variable apply mostly to experimental research ...

measurement scales

Variables differ in how well they can be measured, i.e., in how much measurable information their measurement scale can provide. The most important factor that determines the information that can be provided by a variable is its type of measurement scale. Specifically, variables are classified as (a) nominal, (b) ordinal, (c) interval, or (d) ratio. ...

Variables differ in how well they can be measured, i.e., in how much measurable information their measurement scale can provide. The most important factor that determines the information that can be provided by a variable is its type of measurement scale. Specifically, variables are classified as (a) nominal, (b) ordinal, (c) interval, or (d) ratio. ...

relations between variables

Regardless of their type, two or more variables are related if, in a sample of observations, the values of those variables are distributed in a consistent manner. In other words, variables are related if their values systematically correspond to each other for these observations.

For example ...

Regardless of their type, two or more variables are related if, in a sample of observations, the values of those variables are distributed in a consistent manner. In other words, variables are related if their values systematically correspond to each other for these observations.

For example ...

why relations between variables are important

Generally speaking, the ultimate goal of every research or scientific analysis is to find relations between variables. The philosophy of science teaches us that there is no other way of representing "meaning" except in terms of relations between some quantities or qualities; either way involves relations between variables. Thus, the advancement of science must always involve finding new relations between variables...

Generally speaking, the ultimate goal of every research or scientific analysis is to find relations between variables. The philosophy of science teaches us that there is no other way of representing "meaning" except in terms of relations between some quantities or qualities; either way involves relations between variables. Thus, the advancement of science must always involve finding new relations between variables...

magnitude and reliability

The two most elementary properties of every relation between variables are concerned with the size or**magnitude** of the variables and its 'truthfulness' or **reliability**.

To find out about magnitude we count or measure incidences of our variable, and for reliability we apply the idea of 'repeatability'...

The two most elementary properties of every relation between variables are concerned with the size or

To find out about magnitude we count or measure incidences of our variable, and for reliability we apply the idea of 'repeatability'...

what is statistical significance

The statistical significance of a result is**the probability** that the observed relationship between variables, or a difference, for example, between means, in samples has occurred **by chance**, and that in the population from which the sample was drawn, no such relationship or differences exist.

Using less technical terms, we could say that thestatistical significance of a result tells us something about the degree to which the result is "true", that is, "representative" of the population ...

The statistical significance of a result is

Using less technical terms, we could say that thestatistical significance of a result tells us something about the degree to which the result is "true", that is, "representative" of the population ...

is the result *really* significant?

The level of significance that will be treated as really "significant” is much of an arbitrary decision. That is, the selection of some level of significance, up to which the results will be rejected as invalid, is down to the researcher to decide. In practice, the final decision usually depends on whether theoutcome waspredicted, ...

The level of significance that will be treated as really "significant” is much of an arbitrary decision. That is, the selection of some level of significance, up to which the results will be rejected as invalid, is down to the researcher to decide. In practice, the final decision usually depends on whether theoutcome waspredicted, ...

significance vs analyses

The more analyses you perform on a data set, the more results will meet the conventional significance level 'by chance'. For example, if you calculate correlations between ten variables (i.e., 45 different correlation coefficients), then you should expect to find by chance that about two (i.e., one in every 20) correlation coefficients are significant at the p .05 level, even if the values of the variables weretotally randomand thosevariables do not correlate in the population...

The more analyses you perform on a data set, the more results will meet the conventional significance level 'by chance'. For example, if you calculate correlations between ten variables (i.e., 45 different correlation coefficients), then you should expect to find by chance that about two (i.e., one in every 20) correlation coefficients are significant at the p .05 level, even if the values of the variables weretotally randomand thosevariables do not correlate in the population...

strength vs reliability

Although strength and reliability are two different features of the relationship between variables they clearly impact on one another.

Generally, in a sample of a particular size, the larger the*magnitude* of the relation between the variables, the more **reliable** the relation. Remember, 'reliability' is all about 'repeatability' and we might think of *strength* in terms of the magnitude of the relation.

Although strength and reliability are two different features of the relationship between variables they clearly impact on one another.

Generally, in a sample of a particular size, the larger the

strength and significance

Assuming that there is NO relation between variables in a population we would also expect to find no relation between these variables in the research sample. Conversely, the**stronger** the relation found in the sample, the less likely it is that there will be **no** corresponding relation in the population...

Assuming that there is NO relation between variables in a population we would also expect to find no relation between these variables in the research sample. Conversely, the

significance and sample size

This idea is important:

If a data set is small, that is, it contains only a few incidences of the variables being assessed, then there are also relatively few possible combinations of the values of the variables and hence, the probability of getting**by chance** any particular combination of those values indicative of a **strong relation** is relatively high...

This idea is important:

If a data set is small, that is, it contains only a few incidences of the variables being assessed, then there are also relatively few possible combinations of the values of the variables and hence, the probability of getting

objective relations vs sample size

If a relation between variables of interest is 'objectively' small, - that is, the background population that they are drawn from is small - then unless the research sample is correspondingly large there is no way to identify such a relation. Here are a couple of examples...

If a relation between variables of interest is 'objectively' small, - that is, the background population that they are drawn from is small - then unless the research sample is correspondingly large there is no way to identify such a relation. Here are a couple of examples...

HYPOTHESES and research questions

The key principle in almost all research projects is to collect some data, examine, measure or count it, and then report what that tells you, Usually this means starting with a **research question - **or often in the social sciences, collect data first and then see if something interesting emerges (known as 'grounded theory research') - that enablea research question to be developed ...

measuring the strength of a relationship

There are very many measures of the magnitude of relationships between variables that have been developed by statisticians; the choice of a specific measure in given circumstances depends on the number of variables involved, measurement scales used, nature of the relations, etc. Almost all of them, however, follow one general principle: they attempt to somehow evaluate the observed relation by comparing it to the "maximum imaginable relation" between those specific variables...

There are very many measures of the magnitude of relationships between variables that have been developed by statisticians; the choice of a specific measure in given circumstances depends on the number of variables involved, measurement scales used, nature of the relations, etc. Almost all of them, however, follow one general principle: they attempt to somehow evaluate the observed relation by comparing it to the "maximum imaginable relation" between those specific variables...

statistical tests: what are they?

As the 'question' that your research is suggesting develops into the 'statement' that is your Hypothesis you will use the statistics that you can compute, based on the data you are analysing, to**test** the strength of the relation between our variables. We are testing the strength of the veracity (truthfulness) of the statement that is our Hypothesis ...

As the 'question' that your research is suggesting develops into the 'statement' that is your Hypothesis you will use the statistics that you can compute, based on the data you are analysing, to

STATISTICAL TESTS - more detail

The statistical process we use to**test** our research hypothesis is determined by many factors. For example, to look for a significant difference between sample means, the **t-test** may ideal whereas in searching for relationships or linkages between several dependent variables, multi-factorial ANOVA may beappropriate. So how do we decide which test to apply? ...

The statistical process we use to

format of statistical tests

Because the ultimate goal of most statistical tests is to evaluate relations between variables - which effectively is 'testing an hypothesis' - most statistical tests a general format, explained in the earlier section about measuring relationship strength. To be strict, tests represent a ratio of some measure of the differentiation common in the variables of interest to the overall differentiation of those variables...

Because the ultimate goal of most statistical tests is to evaluate relations between variables - which effectively is 'testing an hypothesis' - most statistical tests a general format, explained in the earlier section about measuring relationship strength. To be strict, tests represent a ratio of some measure of the differentiation common in the variables of interest to the overall differentiation of those variables...

how 'level of significance' is calculated

Let's assume that we have already calculated a measure of a relation between two variables (as explained in earlier examples). The next question is "how significant is this relation?" For example, is 40% of the explained variance between the two variables enough to consider the relation significant? The answer is "it depends." Specifically, the significance depends mostly on the sample size. As explained before, in very large samples, even very small relations between variables will be significant ...

Let's assume that we have already calculated a measure of a relation between two variables (as explained in earlier examples). The next question is "how significant is this relation?" For example, is 40% of the explained variance between the two variables enough to consider the relation significant? The answer is "it depends." Specifically, the significance depends mostly on the sample size. As explained before, in very large samples, even very small relations between variables will be significant ...

the NORMAL DISTRIBUTION

**Why is it so important?**

The distribution of many test statistics is normal or follows some form that can be derived from the normal distribution. Philosophically speaking, the normal distribution represents one of the empirically verified elementary "truths about the general nature of reality," and its status can be compared to the one of fundamental laws of natural sciences...

The distribution of many test statistics is normal or follows some form that can be derived from the normal distribution. Philosophically speaking, the normal distribution represents one of the empirically verified elementary "truths about the general nature of reality," and its status can be compared to the one of fundamental laws of natural sciences...

the Normal Distribution in statistical reasoning

Recall the earlier where pairs of samples of males and females were drawn from a population in which the average value of White Cell Count in males and females was exactly the same. Although the most likely outcome of such experiments (one pair of samples per experiment) was that the difference between the average WCC in males and females in each pair is close to zero, from time to time, a pair of samples will be drawn where the difference between males and females is quite different from 0. How often does it happen? ...

Recall the earlier where pairs of samples of males and females were drawn from a population in which the average value of White Cell Count in males and females was exactly the same. Although the most likely outcome of such experiments (one pair of samples per experiment) was that the difference between the average WCC in males and females in each pair is close to zero, from time to time, a pair of samples will be drawn where the difference between males and females is quite different from 0. How often does it happen? ...

are test statistics always normally distributed?

Not all, but most of them are either based on the normal distribution directly or on distributions that are related to and can be derived from normal, such as t, F, or Chi-square. Typically, these tests require that the variables analyzed are themselves normally distributed in the population, that is, they meet the so-called "normality assumption." Many observed variables actually are normally distributed, which is another reason why the normal distribution represents a "general feature" of empirical reality.

Not all, but most of them are either based on the normal distribution directly or on distributions that are related to and can be derived from normal, such as t, F, or Chi-square. Typically, these tests require that the variables analyzed are themselves normally distributed in the population, that is, they meet the so-called "normality assumption." Many observed variables actually are normally distributed, which is another reason why the normal distribution represents a "general feature" of empirical reality.

the CENTRAL LIMIT THEOREM

This is a fundamental idea which justifyies the use of the Normal Distribution and statistical tests that are based on it in many research projects.

One way of understanding the Central Limit Theorem is in grasping its core statement that 'the distribution of sample means is always Normal'. But what does this really mean? ...

This is a fundamental idea which justifyies the use of the Normal Distribution and statistical tests that are based on it in many research projects.

One way of understanding the Central Limit Theorem is in grasping its core statement that 'the distribution of sample means is always Normal'. But what does this really mean? ...

Violating the normality assumption

Although many of the statements made in the preceding panels can be proven mathematically, some of them do not have theoretical proof and can be demonstrated only empirically, via so-called Monte-Carlo experiments. In these experiments, large numbers of samples are generated by a computer following predesigned specifications, and the results from such samples are analyzed using a variety of tests. This way we can empirically evaluate the type and magnitude of errors or biases ...

Although many of the statements made in the preceding panels can be proven mathematically, some of them do not have theoretical proof and can be demonstrated only empirically, via so-called Monte-Carlo experiments. In these experiments, large numbers of samples are generated by a computer following predesigned specifications, and the results from such samples are analyzed using a variety of tests. This way we can empirically evaluate the type and magnitude of errors or biases ...

development ideas

Use this resources as a starting point for developing your statistical knowledge and competencies to support your research project.

Visit the** StatSoft** webpages for comprehensive guides on more advanced statistical processes, such as the use of ANOVA, multi-factorial analysis, regression processes and many other topics.

Follow this link to a statistics timeline for guidance on putting your understanding of the fundamental concepts into practice using 'basic' statistical methods

Use this resources as a starting point for developing your statistical knowledge and competencies to support your research project.

Visit the

Follow this link to a statistics timeline for guidance on putting your understanding of the fundamental concepts into practice using 'basic' statistical methods