Linear Regression
Error
Surprisingly, there is enough
information in the five aid coefficients to compute the actual RMS
error between estimated output and actual output without the need to
store and compute the RMS error from the individual values of the
sets.
It takes a huge amount of math to get
there, but it's a job that needs to be done only once.
1Error Estimation
I want to estimate the error of the
linear regression. It is possible to do so from the aid coefficients,
without using the original set values. This means jumping from 2N to
5 memory slots.
I can go
further. I see that the function depends on gain and bias. I want to
extract the formula function of just the aid coefficients. This is
gruesome.
I uses excel
to make sure this is still correct. It takes less time if I catch
errors early on.
Next, I eliminate the dependence from
the intermediate coefficients. Develops the individual products.
Inject the developed products in the
numerator. In retrospect I could have extracted Sum YY fro the
argument as it's the only place where it appears anywhere.
I use excel to verify the formula is
still correct and catch mistakes.
Next step is
to collect and pack the coefficients.
partial fraction decomposition is too
taxing to do with so many terms, and wolfram just gives up.
I do it the old fashioned way,
searching for terms that have a common elements and when grouped
shows the same structure as the denominator or a part of it, allowing
me to unpack the fraction in a clean way using fewer terms and
operations.
I inject the final developed argument
in the original definition of the error.
I use dimensional analysis to test
whatever the function has physical meaning.
I use excel to verify that the formula
is still correct.
Next, I find the error function of the
alternate set of aid coefficients, the averages.
2Meaning of the Error
Error has the same units of the output
and is referred to a Gaussian distribution.
The sigma is the multiplier of the RMS
error estimated. The center of the distribution is the actual trend
line that has been computed through linear regression.
The error lines can be drawn just by
applying an offset to the trend lines. 95% of values (blue) are
expected to lay between +2 RMS error (gray) and -2 RMS error (gray)
from the trend line (black).
I applied the linear regression
formulae on a noisy line with equal amount of linear dependence and
noise in them. Just this once, I counted the number of samples within
0.5, 1.0, 1.5 and 2.0 sigma to see if things works out. They do!
The effective error diameter is four
times the RMS error.
3Recap
4Conclusions
From the aid coefficient it's possible to compute the RMS error the estimated output of the linear regression. 95% of set values are expected to be found within a diameter of 4 times the RMS error from the trend line.This error allows for a much more robust correlation FOM.
No comments:
Post a Comment