What is a Standard

deviation? Why is it useful and how does it differ from Standard Error?

Standard deviation (SD) is used to measure the amount of

variation of a set of data. A small standard deviation means that there is a

small spread and that the majority of the data points are close to the mean. A

larger standard deviation means that there is a larger spread so the data points

are spread over a wider range of values. A bell curve with a small standard

deviation would be tall and thin whereas one with a high standard deviation

would be short and wide.

Standard deviation can be used to calculate whether there

are any outliers in the set of data because any data point that is not with 2

standard deviation of the mean is an outlier.

Standard error (SE) is different from standard deviation because

the standard deviation of a sample is an estimate of the variability of the parent

population however standard error is a measure of the uncertainty around the

estimate of the mean. The standard deviation will not change much if the sample

increase but the standard error depends on both the standard deviation and the

sample size as so as

the sample size increases the standard error decreases.

Used to measure the confidence in statistical conclusions /

margin of error

Comment on the simple

linear regression below:

Y’=2.04 + 4.22T

(2.02)

(0.20)

r^2= 0.98

Y’ is the regression line that minimises the residuals and

is the best fit line for the data. The regression coefficient of T is 4.22

which represents the rate at which T increases as Y increases. In this case

there is a strong positive correlation between T and Y and graphically the

graph would have a gradient of 4.22. The intercept is 2.04 so when T=0 Y=2.04.

R squared is the coefficient of determination and measures

how much of the variance in one variable is explained by the model or the

measure of the association between Y and T

R squared is the percentage of the Y variation that is

explained by this model. Since here r squared is 0.98 which is very close to 1,

this model explains 98% of the variability of the response data around the mean

(REWORD) so this model fit the data well.