chem 454
April 13
Friday the 13th Statistics
Everything could go wrong
Murphy's Laws:
- If anything can possibly go wrong, it will
- Nature always sides with the undiscovered fault
- Jelly bread always falls jelly side down.
Inspired by my eventual grading of problem set including ch 9, problems 21-22
There are some unappreciated features of those problems.
First, an overview
- Regression-- finding the best fit of data to a curve
- for discussion, limit to best straight line
- we can, of course fit to polynomials or other more complex functions
- best straight line y = a + b x
- we want to go beyond simply knowing the best equation
- we want to know how reliable, useful the results will be.
- Residuals are defined for each point
-
ri = [ yi - (a + b xi) ]2
"best fit" is the one that minimizes the sum of the residuals
- you should be aware of a bias and limitation here
- this assumes that the error is associated with determining y
- it assumes that x is known with much higher certainty
- for many analytical determinations this is true
- x is the concentration of carefully prepared standards of pure materials
- good balances, volumetric glassware, good technique and pure reagents can keep the uncertainty of x small
- y is the instrumental measurement, subject to a variety of S/N problems
- you can however, encounter cases were x is subject to errors, perhaps errors exceeding those of y.
- you might swap the role of x and y and proceed
- if x and y have comparable errors you need another set of statistical tools
- we will not consider that complication further... x is assumed known accurately.
- Standard Statistical Tools give us additional information
- Appendix A in the text has formulas you can use
- Excel has a Tool (Data Analysis/ Regression) that you can use
- Other math packages have more statistics
- Assume a package gives us the first four parameters
- the best slope, m
(eq. a1-32)
the standard deviation of the slope, sm
(eq. a1-35)
the best intercept, b
(eq. a1-33)
-
(this is the y-intercept, the value of y when x=0)
- the standard deviation of the intercept, sb
(eq. a1-36)
- usually we use slope and intercept to covert an experimental y value to determine x : ____ x = (y - intercept) / slope
- standard deviations give us some numbers to use in estimating the error of any point evaluated using the formula
- Another useful value is the standard deviation of the residuals sr
(eq. a1-34)
- really a measure of how much difference there is between the y-values and the best curve.
Problem #22 can be approached with the information above.
- This is a standard additions problem--
- There is only one set of data being analyzed-- fixed sample + known standard additions
- the calibration curve is essentially being determined simultaneously
- The set of data is extrapolated to determine the x intercept (where the curve has y=0)
- x-intercept = -(y-intercept ) / (slope)
- the uncertainties in y- intercept and slope fix the uncertainties in the x- intercept
- when you multiply or divide, the variance of the result is the sum of the variance of the two values
- variance is the square of the standard deviation (noted here as S)
- S(result)2 = S(intercept)2 + S(slope)2
- S(result) = square root (Sa2 + Sb2)
Problem 21 has more data and additional errors that must be included
One set of data is used to determine a calibration curve
The additional measurements are made on the samples being analyzed
- the results involve two sources of error
- those of the calibration data
- those of the sample measurements
in general, the sample measurements can't be any more accurate than the calibration data
in practice, the sample measurements will be less accurate
- we can reduce the sample measurement errors by making replicate measurements
- we need to calculate the error for each sample we analyze
we will need to know the number of points used to determine the curve (N) as well as the number of replicate measurements (M).
Appendix , page A19, equation
(eq. a1-37)
- sc = (sr/m) sqrt { 1/M + 1/N + (Yavc + Yav)2/(m2Sxx) }
- m is slope
- M is number of analyses (replicates)
- N is number of points used for the calibration curve
- yavc is the average of the replicate measurements
- yav is the average of the values from the calibration curve
- Sxx is computed from the calibration data
- Sxx = (Sum) Xi2 - {(Sum) xi }2/ N