Italiano - English

Standard deviation numerically stable calculation for SQL

Translation for this document is not available or is not complete,
if you are intrested to receive information please write to

Despite of Standard Deviation is one of most relevant parameter of descriptive statistics, it isn't available as common aggregation function for some SQL server. Here is described how to calculate and use standard deviation in a single select query. Finally a better and numerically stable sql procedure is shown.

First of all recall common formula for sample standard deviation

Common formula is double pass because it requires of average value of dataset. Double pass formula it's hard to implement in SQL and generally it isn’t really efficient.

We can rewrite the formula like here:

Et voilà! Now we can write an SQL statement like this:

 
SELECT SQRT(
 (COUNT(x) * SUM(POWER(x,2)) - POWER(SUM(x), 2) ) / (COUNT(x) * (COUNT(x) - 1))
) FROM data_table

Checking for num of samples and negative sqrt, it becomes:

 
SELECT
  CAST (
    (CASE WHEN (COUNT(x) > 1) 
          AND ( (COUNT(x) * SUM(POWER(x,2)) - POWER(SUM(x), 2) ) > 0)
    THEN
      SQRT((COUNT(x) * SUM(POWER(x,2)) - POWER(SUM(x), 2) ) / (COUNT(x)*(COUNT(x) - 1)))
    ELSE
      CAST(NULL AS DOUBLE PRECISION)
    END) AS double precision) AS stdev
FROM data_table
Above SQL statement is not numerically stable if samples values are big.

If samples are large and their differences are small we will have a round-off error or arithmetic overflow/underflow.

Try it ! Check the above SQL with a dataset where you have X[i]=round(0,1) and Xbig[i]=x[i]+10000000. Both standard deviation must be same but you will be surprised to see 2 different values. More exactly the code returns right result for stdver(X) but stdver(Xbig) is wrong !!

Than the above SQL statement has not much sense due to numerically instability when X are big numbers. Remember that this issue is present also if you use the common (double pass) formula for standard deviation.

The solution: running statistics calculation

There are available many algorithms for standard deviation calculation, most of them are numerically stable. In addition there are algorithms that can calculate standard deviation in incremental or recursive way.

One if this is due to Knuth & Welford. Here is a simple implementation in C:

 
double RunningStDev(double *dataset, int numOfSamples)
{
  if(numOfSamples<2)
    return 0;
 
  int n = 0;
  double mean = 0;
  double M2 = 0;
  for(int i=0;i<numOfSamples;i++)
  {
    n++;
    double delta = x - mean;
    mean = mean + delta/n;
    M2 = M2 + delta*(x – mean);
  }
 
  double variance = M2/(n – 1);
  double stdev = sqrt( variance);
  return variance;
}

Above algorithm can be rewritten as SQL stored procedure. Below is an implementation for Firebird SQL server:

 
SET TERM ^ ;
 
CREATE OR ALTER PROCEDURE STDEV2 (
    tablename varchar(100),
    fieldname varchar(100))
returns (
    wavg double precision,
    wvar double precision,
    wstdev double precision)
AS
declare variable mean double precision;
declare variable m2 double precision;
declare variable delta double precision;
declare variable n integer;
declare variable x double precision;
begin
    n = 0;
    mean = 0.0;
    m2 = 0.0;
    FOR execute statement 'SELECT ' || FieldName || ' FROM ' || TableName INTO :x do
    begin
      n = n + 1;
      delta = x - mean;
      mean = mean + delta/n;
      M2 = M2+delta*(x-mean);
    end
 
    -- Set the average variable
    wavg = 0.0;
    IF(n>0) then
        wavg = mean;
 
    -- Set the variance of sample variable
    wvar = 0.0;
    IF(n>1) then
        wvar = M2 / (n-1);
 
    -- Set the Standard Deviation of sample variable
    wstdev = 0.0;
    IF (wvar>0) then
        wstdev = sqrt(wvar);
 
    suspend;
end
^
 
SET TERM ; ^
 
/* Existing privileges on this procedure */
 
GRANT EXECUTE ON PROCEDURE STDEV2 TO SYSDBA;
 

Now you can type a simply query to and get average, variance and standard deviation you want !

 
SELECT * FROM STDEV2('YOUR TABLE NAME', 'YOUR FIELD NAME');

Useful link

Vote this page:

1 Comments:

#1 Sent by Tag 172 18-06-2014

Very cool!
Thank you

Leave your comment:

Note:
  • Your email email will not be visible or used in any way, and is not required
  • Please keep comments relevant
  • Any content deemed inappropriate or offensive may be edited and/or deleted
  • HTML code is not allowed. Please use BBCode to format your text
    [b]bold[/b], [u]underline[/u], [i]italic[/i], [code]code[/code]
The coding examples presented here are for illustration purposes only. The author takes no responsibility for end-user use
This work is property of Pk Lab. You can use it for free but you must retain author's copyright.