Linear
Regression
1Description
Linear Regression is a tool that allows
to compute the best fit line according to a linear law between an
input set X and an output set Y. The two set must be of same size.
2Definitions
Linear Regression estimates two
parameters from the sets. Bias and Gain. Bias has unit of measure of
output set Y. Gain has unit of measure of output set over input set.
Coefficients are computed in order to minimize the square RMS error.
Output estimate minus real output.
3Aid Coefficients
Linear Regression can be defined so
that it uses only five aid coefficients, making it computationally
and memory friendly as only five elements are needed to make the
calculations invariant to the size of the sets.
Four coefficients are the same used to
compute the statistic of a set (average, RMS average, variance and
standard deviation). One additional aid coefficient is the
accumulation of the mixed products and is used to compute the
covariance
The coefficients are computed by
accumulating the two sets in just one pass, making their computation
both stream compatible and fast. All very attractive properties.
Rather than using the accumulators, I
can compute the average of the accumulators over the sets and use
them as aid coefficients. Depending on the software implementation
one may be better than the others. You only need one set of aid
coefficients. Either the accumulators or the averages.
4Variance, Covariance and Standard Deviation
Variance is defined as average of squares minus square of averages. The covariance is the same, it simply uses the mixed products of input and output. Covariance measures the sensitivity of the output to the input.5Linear Regression
Gain is defined as covariance over
variance of input. Bias is computed in order to make sure the average
of the output estimate is the same as the average of the actual
output set.
Linear regression cannot be computed if
6Recap
Linear Regression function of
accumulators
Linear Regression function of averages
7Example
A sensor is measuring a distance in [m] saving the timestamp of the measure [s].Time is the independent variable, position the dependent one. I want to use linear regression to get the slope and the error of the approximation.
Aid parameters are accumulated from the set. From there, the set are no longer needed and everything else can be estimated just from the aid parameters.
Linear regression output are:
- Bias of the trend line
- Gain of the trend line
- Error of the estimation
It is truly remarkable that the error
can be computed without having the individual datapoints. The error
allows a meaningful estimation of the goodness of the fit.
The data (blue) is plotted in a XY
chart.
Around the trend line, the dark gray
lines shows the error range at +/-one sigma of confidence, which is
the RMS error of the estimation. 68% of measures are expected to be
within plus or minus one standard deviation.
Light gray lines shows confidence at
+/- half sigma, which means 38% of samples are expected to be in
range. At +/- two sigma, confidence grows to 95%.
8Conclusions
By using either the accumulation of two
sets, or the averages of the accumulation of two sets, it's possible
to estimate Linear Regression parameters, including Bias, Gain and
even RMS error of the estimate.
No comments:
Post a Comment