What is a Standard
deviation? Why is it useful and how does it differ from Standard Error?
Standard deviation (SD) is used to measure the amount of
variation of a set of data. A small standard deviation means that there is a
small spread and that the majority of the data points are close to the mean. A
larger standard deviation means that there is a larger spread so the data points
are spread over a wider range of values. A bell curve with a small standard
deviation would be tall and thin whereas one with a high standard deviation
would be short and wide.
Standard deviation can be used to calculate whether there
are any outliers in the set of data because any data point that is not with 2
standard deviation of the mean is an outlier.
Standard error (SE) is different from standard deviation because
the standard deviation of a sample is an estimate of the variability of the parent
population however standard error is a measure of the uncertainty around the
estimate of the mean. The standard deviation will not change much if the sample
increase but the standard error depends on both the standard deviation and the
sample size as so as
the sample size increases the standard error decreases.
Used to measure the confidence in statistical conclusions /
margin of error
Comment on the simple
linear regression below:
Y’=2.04 + 4.22T
Y’ is the regression line that minimises the residuals and
is the best fit line for the data. The regression coefficient of T is 4.22
which represents the rate at which T increases as Y increases. In this case
there is a strong positive correlation between T and Y and graphically the
graph would have a gradient of 4.22. The intercept is 2.04 so when T=0 Y=2.04.
R squared is the coefficient of determination and measures
how much of the variance in one variable is explained by the model or the
measure of the association between Y and T
R squared is the percentage of the Y variation that is
explained by this model. Since here r squared is 0.98 which is very close to 1,
this model explains 98% of the variability of the response data around the mean
(REWORD) so this model fit the data well.