diff libs/commons-math-2.1/docs/userguide/stat.html @ 10:5f2c5fb36e93

commons-math-2.1 added
author dwinter
date Tue, 04 Jan 2011 10:00:53 +0100
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/libs/commons-math-2.1/docs/userguide/stat.html	Tue Jan 04 10:00:53 2011 +0100
@@ -0,0 +1,1227 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <title>Math - The Commons Math User Guide - Statistics</title>
+    <style type="text/css" media="all">
+      @import url("../css/maven-base.css");
+      @import url("../css/maven-theme.css");
+      @import url("../css/site.css");
+    </style>
+    <link rel="stylesheet" href="../css/print.css" type="text/css" media="print" />
+        <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
+      </head>
+  <body class="composite">
+    <div id="banner">
+                    <span id="bannerLeft">
+            Commons Math User Guide
+            </span>
+                    <div class="clear">
+        <hr/>
+      </div>
+    </div>
+    <div id="breadcrumbs">
+              <div class="xright">      
+  </div>
+      <div class="clear">
+        <hr/>
+      </div>
+    </div>
+    <div id="leftColumn">
+      <div id="navcolumn">
+                   <h5>User Guide</h5>
+            <ul>
+    <li class="none">
+                    <a href="../userguide/index.html">Contents</a>
+          </li>
+    <li class="none">
+                    <a href="../userguide/overview.html">Overview</a>
+          </li>
+    <li class="none">
+              <strong>Statistics</strong>
+        </li>
+    <li class="none">
+                    <a href="../userguide/random.html">Data Generation</a>
+          </li>
+    <li class="none">
+                    <a href="../userguide/linear.html">Linear Algebra</a>
+          </li>
+    <li class="none">
+                    <a href="../userguide/analysis.html">Numerical Analysis</a>
+          </li>
+    <li class="none">
+                    <a href="../userguide/special.html">Special Functions</a>
+          </li>
+    <li class="none">
+                    <a href="../userguide/utilities.html">Utilities</a>
+          </li>
+    <li class="none">
+                    <a href="../userguide/complex.html">Complex Numbers</a>
+          </li>
+    <li class="none">
+                    <a href="../userguide/distribution.html">Distributions</a>
+          </li>
+    <li class="none">
+                    <a href="../userguide/fraction.html">Fractions</a>
+          </li>
+    <li class="none">
+                    <a href="../userguide/transform.html">Transform Methods</a>
+          </li>
+    <li class="none">
+                    <a href="../userguide/geometry.html">3D Geometry</a>
+          </li>
+    <li class="none">
+                    <a href="../userguide/optimization.html">Optimization</a>
+          </li>
+    <li class="none">
+                    <a href="../userguide/ode.html">Ordinary Differential Equations</a>
+          </li>
+    <li class="none">
+                    <a href="../userguide/genetics.html">Genetic Algorithms</a>
+          </li>
+          </ul>
+                                           <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
+            <img alt="Built by Maven" src="../images/logos/maven-feather.png"></img>
+          </a>
+        </div>
+    </div>
+    <div id="bodyColumn">
+      <div id="contentBox">
+        <div class="section"><h2><a name="a1_Statistics"></a>1 Statistics</h2>
+<div class="section"><h3><a name="a1.1_Overview"></a>1.1 Overview</h3>
+          The statistics package provides frameworks and implementations for
+          basic Descriptive statistics, frequency distributions, bivariate regression,
+          and t-, chi-square and ANOVA test statistics.
+        </p>
+<p><a href="#a1.2_Descriptive_statistics">Descriptive statistics</a><br />
+</br><a href="#a1.3_Frequency_distributions">Frequency distributions</a><br />
+</br><a href="#a1.4_Simple_regression">Simple Regression</a><br />
+</br><a href="#a1.5_Multiple_linear_regression">Multiple Regression</a><br />
+</br><a href="#a1.6_Rank_transformations">Rank transformations</a><br />
+</br><a href="#a1.7_Covariance_and_correlation">Covariance and correlation</a><br />
+</br><a href="#a1.8_Statistical_tests">Statistical Tests</a><br />
+<div class="section"><h3><a name="a1.2_Descriptive_statistics"></a>1.2 Descriptive statistics</h3>
+          The stat package includes a framework and default implementations for
+           the following Descriptive statistics:
+          <ul><li>arithmetic and geometric means</li>
+<li>variance and standard deviation</li>
+<li>sum, product, log sum, sum of squared values</li>
+<li>minimum, maximum, median, and percentiles</li>
+<li>skewness and kurtosis</li>
+<li>first, second, third and fourth moments</li>
+          With the exception of percentiles and the median, all of these
+          statistics can be computed without maintaining the full list of input
+          data values in memory.  The stat package provides interfaces and
+          implementations that do not require value storage as well as
+          implementations that operate on arrays of stored values.
+        </p>
+          The top level interface is
+          <a href="../apidocs/org/apache/commons/math/stat/descriptive/UnivariateStatistic.html">
+          org.apache.commons.math.stat.descriptive.UnivariateStatistic.</a>
+          This interface, implemented by all statistics, consists of
+          <code>evaluate()</code> methods that take double[] arrays as arguments
+          and return the value of the statistic.   This interface is extended by
+          <a href="../apidocs/org/apache/commons/math/stat/descriptive/StorelessUnivariateStatistic.html">
+          StorelessUnivariateStatistic</a>, which adds <code>increment(),</code><code>getResult()</code> and associated methods to support
+          &quot;storageless&quot; implementations that maintain counters, sums or other
+          state information as values are added using the <code>increment()</code>
+          method.
+        </p>
+          Abstract implementations of the top level interfaces are provided in
+          <a href="../apidocs/org/apache/commons/math/stat/descriptive/AbstractUnivariateStatistic.html">
+          AbstractUnivariateStatistic</a> and
+          <a href="../apidocs/org/apache/commons/math/stat/descriptive/AbstractStorelessUnivariateStatistic.html">
+          AbstractStorelessUnivariateStatistic</a> respectively.
+        </p>
+          Each statistic is implemented as a separate class, in one of the
+          subpackages (moment, rank, summary) and each extends one of the abstract
+          classes above (depending on whether or not value storage is required to
+          compute the statistic). There are several ways to instantiate and use statistics.
+          Statistics can be instantiated and used directly,  but it is generally more convenient
+          (and efficient) to access them using the provided aggregates,
+          <a href="../apidocs/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html">
+           DescriptiveStatistics</a> and
+           <a href="../apidocs/org/apache/commons/math/stat/descriptive/SummaryStatistics.html">
+           SummaryStatistics.</a></p>
+<p><code>DescriptiveStatistics</code> maintains the input data in memory
+           and has the capability of producing &quot;rolling&quot; statistics computed from a
+           &quot;window&quot; consisting of the most recently added values.
+        </p>
+<p><code>SummaryStatistics</code> does not store the input data values
+           in memory, so the statistics included in this aggregate are limited to those
+           that can be computed in one pass through the data without access to
+           the full array of values.
+        </p>
+<p><table class="bodyTable"><tr class="a"><th>Aggregate</th>
+<th>Statistics Included</th>
+<th>Values stored?</th>
+<th>&quot;Rolling&quot; capability?</th>
+<tr class="b"><td><a href="../apidocs/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html">
+            DescriptiveStatistics</a></td>
+<td>min, max, mean, geometric mean, n,
+            sum, sum of squares, standard deviation, variance, percentiles, skewness,
+            kurtosis, median</td>
+<tr class="a"><td><a href="../apidocs/org/apache/commons/math/stat/descriptive/SummaryStatistics.html">
+            SummaryStatistics</a></td>
+<td>min, max, mean, geometric mean, n,
+            sum, sum of squares, standard deviation, variance</td>
+<p><code>SummaryStatistics</code> can be aggregated using 
+          <a href="../apidocs/org/apache/commons/math/stat/descriptive/AggregateSummaryStatistics.html">
+          AggregateSummaryStatistics.</a>  This class can be used to concurrently gather statistics for multiple
+          datasets as well as for a combined sample including all of the data.
+       </p>
+<p><code>MultivariateSummaryStatistics</code> is similar to <code>SummaryStatistics</code>
+           but handles n-tuple values instead of scalar values. It can also compute the
+           full covariance matrix for the input data.
+        </p>
+           Neither <code>DescriptiveStatistics</code> nor <code>SummaryStatistics</code> is
+           thread-safe. <a href="../apidocs/org/apache/commons/math/stat/descriptive/SynchronizedDescriptiveStatistics.html">
+           SynchronizedDescriptiveStatistics</a> and
+           <a href="../apidocs/org/apache/commons/math/stat/descriptive/SynchronizedSummaryStatistics.html"> 
+           SynchronizedSummaryStatistics</a>, respectively, provide thread-safe versions for applications that
+           require concurrent access to statistical aggregates by multiple threads.
+           <a href="../apidocs/org/apache/commons/math/stat/descriptive/SynchronizedMultiVariateSummaryStatistics.html"> 
+           SynchronizedMultivariateSummaryStatistics</a> provides threadsafe <code>MultivariateSummaryStatistics.</code></p>
+          There is also a utility class,
+          <a href="../apidocs/org/apache/commons/math/stat/StatUtils.html">
+           StatUtils</a>, that provides static methods for computing statistics
+           directly from double[] arrays.
+        </p>
+          Here are some examples showing how to compute Descriptive statistics.
+          <dl><dt>Compute summary statistics for a list of double values</dt>
+<br />
+</br><dd>Using the <code>DescriptiveStatistics</code> aggregate
+          (values are stored in memory):
+        <div class="source"><pre>
+// Get a DescriptiveStatistics instance
+DescriptiveStatistics stats = new DescriptiveStatistics();
+// Add the data from the array
+for( int i = 0; i &lt; inputArray.length; i++) {
+        stats.addValue(inputArray[i]);
+// Compute some statistics
+double mean = stats.getMean();
+double std = stats.getStandardDeviation();
+double median = stats.getMedian();
+        </pre>
+<dd>Using the <code>SummaryStatistics</code> aggregate (values are
+        <strong>not</strong> stored in memory):
+       <div class="source"><pre>
+// Get a SummaryStatistics instance
+SummaryStatistics stats = new SummaryStatistics();
+// Read data from an input stream,
+// adding values and updating sums, counters, etc.
+while (line != null) {
+        line = in.readLine();
+        stats.addValue(Double.parseDouble(line.trim()));
+// Compute the statistics
+double mean = stats.getMean();
+double std = stats.getStandardDeviation();
+//double median = stats.getMedian(); &lt;-- NOT AVAILABLE
+        </pre>
+<dd>Using the <code>StatUtils</code> utility class:
+       <div class="source"><pre>
+// Compute statistics directly from the array
+// assume values is a double[] array
+double mean = StatUtils.mean(values);
+double std = StatUtils.variance(values);
+double median = StatUtils.percentile(50);
+// Compute the mean of the first three values in the array
+mean = StatUtils.mean(values, 0, 3);
+        </pre>
+<dt>Maintain a &quot;rolling mean&quot; of the most recent 100 values from
+        an input stream</dt>
+<br />
+</br><dd>Use a <code>DescriptiveStatistics</code> instance with
+        window size set to 100
+        <div class="source"><pre>
+// Create a DescriptiveStats instance and set the window size to 100
+DescriptiveStatistics stats = new DescriptiveStatistics();
+// Read data from an input stream,
+// displaying the mean of the most recent 100 observations
+// after every 100 observations
+long nLines = 0;
+while (line != null) {
+        line = in.readLine();
+        stats.addValue(Double.parseDouble(line.trim()));
+        if (nLines == 100) {
+                nLines = 0;
+                System.out.println(stats.getMean());
+       }
+        </pre>
+<dt>Compute statistics in a thread-safe manner</dt>
+<br />
+<dd>Use a <code>SynchronizedDescriptiveStatistics</code> instance
+        <div class="source"><pre>
+// Create a SynchronizedDescriptiveStatistics instance and
+// use as any other DescriptiveStatistics instance
+DescriptiveStatistics stats = new SynchronizedDescriptiveStatistics();
+        </pre>
+<dt>Compute statistics for multiple samples and overall statistics concurrently</dt>
+<br />
+<dd>There are two ways to do this using <code>AggregateSummaryStatistics.</code> 
+        The first is to use an <code>AggregateSummaryStatistics</code> instance to accumulate
+        overall statistics contributed by <code>SummaryStatistics</code> instances created using
+        <a href="../apidocs/org/apache/commons/math/stat/descriptive/AggregateSummaryStatistics.html#createContributingStatistics()">
+          AggregateSummaryStatistics.createContributingStatistics()</a>:
+        <div class="source"><pre>
+// Create a AggregateSummaryStatistics instance to accumulate the overall statistics 
+// and AggregatingSummaryStatistics for the subsamples
+AggregateSummaryStatistics aggregate = new AggregateSummaryStatistics();
+SummaryStatistics setOneStats = aggregate.createContributingStatistics();
+SummaryStatistics setTwoStats = aggregate.createContributingStatistics();
+// Add values to the subsample aggregates
+// Full sample data is reported by the aggregate
+double totalSampleSum = aggregate.getSum();
+        </pre>
+        The above approach has the disadvantages that the <code>addValue</code> calls must be synchronized on the
+        <code>SummaryStatistics</code> instance maintained by the aggregate and each value addition updates the
+        aggregate as well as the subsample. For applications that can wait to do the aggregation until all values
+        have been added, a static
+        <a href="../apidocs/org/apache/commons/math/stat/descriptive/AggregateSummaryStatistics.html#aggregate(java.util.Collection)">
+          aggregate</a> method is available, as shown in the following example.
+        This method should be used when aggregation needs to be done across threads.
+        <div class="source"><pre>
+// Create SummaryStatistics instances for the subsample data
+SummaryStatistics setOneStats = new SummaryStatistics();
+SummaryStatistics setTwoStats = new SummaryStatistics();
+// Add values to the subsample SummaryStatistics instances
+// Aggregate the subsample statistics
+Collection&lt;SummaryStatistics&gt; aggregate = new ArrayList&lt;SummaryStatistics&gt;();
+StatisticalSummary aggregatedStats = AggregateSummaryStatistics.aggregate(aggregate);
+// Full sample data is reported by aggregatedStats
+double totalSampleSum = aggregatedStats.getSum();
+        </pre>
+<div class="section"><h3><a name="a1.3_Frequency_distributions"></a>1.3 Frequency distributions</h3>
+<p><a href="../apidocs/org/apache/commons/math/stat/Frequency.html">
+          org.apache.commons.math.stat.descriptive.Frequency</a>
+          provides a simple interface for maintaining counts and percentages of discrete
+          values.
+        </p>
+          Strings, integers, longs and chars are all supported as value types,
+          as well as instances of any class that implements <code>Comparable.</code>
+          The ordering of values used in computing cumulative frequencies is by
+          default the <i>natural ordering,</i> but this can be overriden by supplying a
+          <code>Comparator</code> to the constructor. Adding values that are not
+          comparable to those that have already been added results in an
+          <code>IllegalArgumentException.</code></p>
+          Here are some examples.
+          <dl><dt>Compute a frequency distribution based on integer values</dt>
+<br />
+</br><dd>Mixing integers, longs, Integers and Longs:
+          <div class="source"><pre>
+ Frequency f = new Frequency();
+ f.addValue(1);
+ f.addValue(new Integer(1));
+ f.addValue(new Long(1));
+ f.addValue(2);
+ f.addValue(new Integer(-1));
+ System.out.prinltn(f.getCount(1));   // displays 3
+ System.out.println(f.getCumPct(0));  // displays 0.2
+ System.out.println(f.getPct(new Integer(1)));  // displays 0.6
+ System.out.println(f.getCumPct(-2));   // displays 0
+ System.out.println(f.getCumPct(10));  // displays 1
+          </pre>
+<dt>Count string frequencies</dt>
+<br />
+</br><dd>Using case-sensitive comparison, alpha sort order (natural comparator):
+          <div class="source"><pre>
+Frequency f = new Frequency();
+System.out.println(f.getCount(&quot;one&quot;)); // displays 1
+System.out.println(f.getCumPct(&quot;Z&quot;));  // displays 0.5
+System.out.println(f.getCumPct(&quot;Ot&quot;)); // displays 0.25
+          </pre>
+<dd>Using case-insensitive comparator:
+          <div class="source"><pre>
+Frequency f = new Frequency(String.CASE_INSENSITIVE_ORDER);
+System.out.println(f.getCount(&quot;one&quot;));  // displays 3
+System.out.println(f.getCumPct(&quot;z&quot;));  // displays 1
+          </pre>
+<div class="section"><h3><a name="a1.4_Simple_regression"></a>1.4 Simple regression</h3>
+<p><a href="../apidocs/org/apache/commons/math/stat/regression/SimpleRegression.html">
+          org.apache.commons.math.stat.regression.SimpleRegression</a>
+          provides ordinary least squares regression with one independent variable,
+          estimating the linear model:
+         </p>
+<p><code> y = intercept + slope * x  </code></p>
+           Standard errors for <code>intercept</code> and <code>slope</code> are
+           available as well as ANOVA, r-square and Pearson's r statistics.
+         </p>
+           Observations (x,y pairs) can be added to the model one at a time or they
+           can be provided in a 2-dimensional array.  The observations are not stored
+           in memory, so there is no limit to the number of observations that can be
+           added to the model.
+         </p>
+<p><strong>Usage Notes</strong>: <ul><li> When there are fewer than two observations in the model, or when
+            there is no variation in the x values (i.e. all x values are the same)
+            all statistics return <code>NaN</code>.  At least two observations with
+            different x coordinates are requred to estimate a bivariate regression
+            model.</li>
+<li> getters for the statistics always compute values based on the current
+           set of observations -- i.e., you can get statistics, then add more data
+           and get updated statistics without using a new instance.  There is no
+           &quot;compute&quot; method that updates all statistics.  Each of the getters performs
+           the necessary computations to return the requested statistic.</li>
+<p><strong>Implementation Notes</strong>: <ul><li> As observations are added to the model, the sum of x values, y values,
+           cross products (x times y), and squared deviations of x and y from their
+           respective means are updated using updating formulas defined in
+           &quot;Algorithms for Computing the Sample Variance: Analysis and
+           Recommendations&quot;, Chan, T.F., Golub, G.H., and LeVeque, R.J.
+           1983, American Statistician, vol. 37, pp. 242-247, referenced in
+           Weisberg, S. &quot;Applied Linear Regression&quot;. 2nd Ed. 1985.  All regression
+           statistics are computed from these sums.</li>
+<li> Inference statistics (confidence intervals, parameter significance levels)
+           are based on on the assumption that the observations included in the model are
+           drawn from a <a href="http://mathworld.wolfram.com/BivariateNormalDistribution.html" class="externalLink">
+           Bivariate Normal Distribution</a></li>
+        Here are some examples.
+        <dl><dt>Estimate a model based on observations added one at a time</dt>
+<br />
+</br><dd>Instantiate a regression instance and add data points
+          <div class="source"><pre>
+regression = new SimpleRegression();
+regression.addData(1d, 2d);
+// At this point, with only one observation,
+// all regression statistics will return NaN
+regression.addData(3d, 3d);
+// With only two observations,
+// slope and intercept can be computed
+// but inference statistics will return NaN
+regression.addData(3d, 3d);
+// Now all statistics are defined.
+         </pre>
+<dd>Compute some statistics based on observations added so far
+         <div class="source"><pre>
+// displays intercept of regression line
+// displays slope of regression line
+// displays slope standard error
+         </pre>
+<dd>Use the regression model to predict the y value for a new x value
+         <div class="source"><pre>
+// displays predicted y value for x = 1.5
+         </pre>
+         More data points can be added and subsequent getXxx calls will incorporate
+         additional data in statistics.
+         </dd>
+<dt>Estimate a model from a double[][] array of data points</dt>
+<br />
+</br><dd>Instantiate a regression object and load dataset
+          <div class="source"><pre>
+double[][] data = { { 1, 3 }, {2, 5 }, {3, 7 }, {4, 14 }, {5, 11 }};
+SimpleRegression regression = new SimpleRegression();
+          </pre>
+<dd>Estimate regression model based on data
+         <div class="source"><pre>
+// displays intercept of regression line
+// displays slope of regression line
+// displays slope standard error
+         </pre>
+         More data points -- even another double[][] array -- can be added and subsequent
+         getXxx calls will incorporate additional data in statistics.
+         </dd>
+<div class="section"><h3><a name="a1.5_Multiple_linear_regression"></a>1.5 Multiple linear regression</h3>
+<p><a href="../apidocs/org/apache/commons/math/stat/regression/MultipleLinearRegression.html">
+          org.apache.commons.math.stat.regression.MultipleLinearRegression</a>
+          provides ordinary least squares regression with a generic multiple variable linear model, which
+          in matrix notation can be expressed as:
+         </p>
+<p><code> y=X*b+u </code></p>
+         where y is an <code>n-vector</code><b>regressand</b>, X is a <code>[n,k]</code> matrix whose <code>k</code> columns are called
+         <b>regressors</b>, b is <code>k-vector</code> of <b>regression parameters</b> and <code>u</code> is an <code>n-vector</code> 
+         of <b>error terms</b> or <b>residuals</b>.   The notation is quite standard in literature, 
+         cf eg <a href="http://www.econ.queensu.ca/ETM" class="externalLink">Davidson and MacKinnon, Econometrics Theory and Methods, 2004</a>.
+         </p>
+          Two implementations are provided: <a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html">
+          org.apache.commons.math.stat.regression.OLSMultipleLinearRegression</a> and 
+          <a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html">
+          org.apache.commons.math.stat.regression.GLSMultipleLinearRegression</a></p>
+           Observations (x,y and covariance data matrices) can be added to the model via the <code>addData(double[] y, double[][] x, double[][] covariance)</code> method.
+           The observations are stored in memory until the next time the addData method is invoked.  
+         </p>
+<p><strong>Usage Notes</strong>: <ul><li> Data is validated when invoking the <code>addData(double[] y, double[][] x, double[][] covariance)</code> method and
+           <code>IllegalArgumentException</code> is thrown when inappropriate. 
+           </li>
+<li> Only the GLS regressions require the covariance matrix, so in the OLS regression it is ignored and can be safely
+           inputted as <code>null</code>.</li>
+        Here are some examples.
+        <dl><dt>OLS regression</dt>
+<br />
+</br><dd>Instantiate an OLS regression object and load dataset
+          <div class="source"><pre>
+MultipleLinearRegression regression = new OLSMultipleLinearRegression();
+double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
+double[] x = new double[6][];
+x[0] = new double[]{1.0, 0, 0, 0, 0, 0};
+x[1] = new double[]{1.0, 2.0, 0, 0, 0, 0};
+x[2] = new double[]{1.0, 0, 3.0, 0, 0, 0};
+x[3] = new double[]{1.0, 0, 0, 4.0, 0, 0};
+x[4] = new double[]{1.0, 0, 0, 0, 5.0, 0};
+x[5] = new double[]{1.0, 0, 0, 0, 0, 6.0};          
+regression.addData(y, x, null); // we don't need covariance
+          </pre>
+<dd>Estimate of regression values honours the <code>MultipleLinearRegression</code> interface:
+         <div class="source"><pre>
+double[] beta = regression.estimateRegressionParameters();        
+double[] residuals = regression.estimateResiduals();
+double[][] parametersVariance = regression.estimateRegressionParametersVariance();
+double regressandVariance = regression.estimateRegressandVariance();
+         </pre>
+<dt>GLS regression</dt>
+<br />
+</br><dd>Instantiate an GLS regression object and load dataset
+          <div class="source"><pre>
+MultipleLinearRegression regression = new GLSMultipleLinearRegression();
+double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
+double[] x = new double[6][];
+x[0] = new double[]{1.0, 0, 0, 0, 0, 0};
+x[1] = new double[]{1.0, 2.0, 0, 0, 0, 0};
+x[2] = new double[]{1.0, 0, 3.0, 0, 0, 0};
+x[3] = new double[]{1.0, 0, 0, 4.0, 0, 0};
+x[4] = new double[]{1.0, 0, 0, 0, 5.0, 0};
+x[5] = new double[]{1.0, 0, 0, 0, 0, 6.0};          
+double[][] omega = new double[6][];
+omega[0] = new double[]{1.1, 0, 0, 0, 0, 0};
+omega[1] = new double[]{0, 2.2, 0, 0, 0, 0};
+omega[2] = new double[]{0, 0, 3.3, 0, 0, 0};
+omega[3] = new double[]{0, 0, 0, 4.4, 0, 0};
+omega[4] = new double[]{0, 0, 0, 0, 5.5, 0};
+omega[5] = new double[]{0, 0, 0, 0, 0, 6.6};
+regression.addData(y, x, omega); // we do need covariance
+          </pre>
+<dd>Estimate of regression values honours the same <code>MultipleLinearRegression</code> interface as 
+          the OLS regression.
+         </dd>
+<div class="section"><h3><a name="a1.6_Rank_transformations"></a>1.6 Rank transformations</h3>
+         Some statistical algorithms require that input data be replaced by ranks.
+         The <a href="../apidocs/org/apache/commons/math/stat/ranking/package-summary.html">
+         org.apache.commons.math.stat.ranking</a> package provides rank transformation.
+         <a href="../apidocs/org/apache/commons/math/stat/ranking/RankingAlgorithm.html">
+         RankingAlgorithm</a> defines the interface for ranking.  
+         <a href="../apidocs/org/apache/commons/math/stat/ranking/NaturalRanking.html">
+         NaturalRanking</a> provides an implementation that has two configuration options.
+         <ul><li><a href="../apidocs/org/apache/commons/math/stat/ranking/TiesStrategy.html">
+         Ties strategy</a> deterimines how ties in the source data are handled by the ranking</li>
+<li><a href="../apidocs/org/apache/commons/math/stat/ranking/NaNStrategy.html">
+         NaN strategy</a> determines how NaN values in the source data are handled.</li>
+         Examples:
+         <div class="source"><pre>
+NaturalRanking ranking = new NaturalRanking(NaNStrategy.MINIMAL,
+double[] data = { 20, 17, 30, 42.3, 17, 50,
+                  Double.NaN, Double.NEGATIVE_INFINITY, 17 };
+double[] ranks = ranking.rank(exampleData);
+         </pre>
+         results in <code>ranks</code> containing <code>{6, 5, 7, 8, 5, 9, 2, 2, 5}.</code><div class="source"><pre>
+new NaturalRanking(NaNStrategy.REMOVED,TiesStrategy.SEQUENTIAL).rank(exampleData);   
+         </pre>
+         returns <code>{5, 2, 6, 7, 3, 8, 1, 4}.</code></p>
+        The default <code>NaNStrategy</code> is NaNStrategy.MAXIMAL.  This makes <code>NaN</code>
+        values larger than any other value (including <code>Double.POSITIVE_INFINITY</code>). The
+        default <code>TiesStrategy</code> is <code>TiesStrategy.AVERAGE,</code> which assigns tied
+        values the average of the ranks applicable to the sequence of ties.  See the 
+        <a href="../apidocs/org/apache/commons/math/stat/ranking/NaturalRanking.html">
+        NaturalRanking</a> for more examples and <a href="../apidocs/org/apache/commons/math/stat/ranking/TiesStrategy.html">
+        TiesStrategy</a> and <a href="../apidocs/org/apache/commons/math/stat/ranking/NaNStrategy.html">NaNStrategy</a>
+        for details on these configuration options.
+       </p>
+<div class="section"><h3><a name="a1.7_Covariance_and_correlation"></a>1.7 Covariance and correlation</h3>
+          The <a href="../apidocs/org/apache/commons/math/stat/correlation/package-summary.html">
+          org.apache.commons.math.stat.correlation</a> package computes covariances
+          and correlations for pairs of arrays or columns of a matrix.
+          <a href="../apidocs/org/apache/commons/math/stat/correlation/Covariance.html">
+          Covariance</a> computes covariances, 
+          <a href="../apidocs/org/apache/commons/math/stat/correlation/PearsonsCorrelation.html">
+          PearsonsCorrelation</a> provides Pearson's Product-Moment correlation coefficients and
+          <a href="../apidocs/org/apache/commons/math/stat/correlation/SpearmansCorrelation.html">
+          SpearmansCorrelation</a> computes Spearman's rank correlation.
+        </p>
+<p><strong>Implementation Notes</strong><ul><li>
+            Unbiased covariances are given by the formula <br />
+</br><code>cov(X, Y) = sum [(x<sub>i</sub> - E(X))(y<sub>i</sub> - E(Y))] / (n - 1)</code>
+            where <code>E(X)</code> is the mean of <code>X</code> and <code>E(Y)</code>
+           is the mean of the <code>Y</code> values. Non-bias-corrected estimates use 
+           <code>n</code> in place of <code>n - 1.</code>  Whether or not covariances are
+           bias-corrected is determined by the optional parameter, &quot;biasCorrected,&quot; which
+           defaults to <code>true.</code></li>
+<li><a href="../apidocs/org/apache/commons/math/stat/correlation/PearsonsCorrelation.html">
+          PearsonsCorrelation</a> computes correlations defined by the formula <br />
+</br><code>cor(X, Y) = sum[(x<sub>i</sub> - E(X))(y<sub>i</sub> - E(Y))] / [(n - 1)s(X)s(Y)]</code><br />
+          where <code>E(X)</code> and <code>E(Y)</code> are means of <code>X</code> and <code>Y</code>
+          and <code>s(X)</code>, <code>s(Y)</code> are standard deviations.
+          </li>
+<li><a href="../apidocs/org/apache/commons/math/stat/correlation/SpearmansCorrelation.html">
+          SpearmansCorrelation</a> applies a rank transformation to the input data and computes Pearson's
+          correlation on the ranked data.  The ranking algorithm is configurable. By default, 
+          <a href="../apidocs/org/apache/commons/math/stat/ranking/NaturalRanking.html">
+          NaturalRanking</a> with default strategies for handling ties and NaN values is used.
+          </li>
+<p><strong>Examples:</strong><dl><dt><strong>Covariance of 2 arrays</strong></dt>
+<br />
+</br><dd>To compute the unbiased covariance between 2 double arrays,
+          <code>x</code> and <code>y</code>, use:
+          <div class="source"><pre>
+new Covariance().covariance(x, y)
+          </pre>
+          For non-bias-corrected covariances, use
+          <div class="source"><pre>
+covariance(x, y, false)
+          </pre>
+<br />
+</br><dt><strong>Covariance matrix</strong></dt>
+<br />
+</br><dd> A covariance matrix over the columns of a source matrix <code>data</code>
+          can be computed using
+          <div class="source"><pre>
+new Covariance().computeCovarianceMatrix(data)
+          </pre>
+          The i-jth entry of the returned matrix is the unbiased covariance of the ith and jth
+          columns of <code>data.</code> As above, to get non-bias-corrected covariances,
+          use 
+         <div class="source"><pre>
+computeCovarianceMatrix(data, false)
+         </pre>
+<br />
+</br><dt><strong>Pearson's correlation of 2 arrays</strong></dt>
+<br />
+</br><dd>To compute the Pearson's product-moment correlation between two double arrays
+          <code>x</code> and <code>y</code>, use:
+          <div class="source"><pre>
+new PearsonsCorrelation().correlation(x, y)
+          </pre>
+<br />
+</br><dt><strong>Pearson's correlation matrix</strong></dt>
+<br />
+</br><dd> A (Pearson's) correlation matrix over the columns of a source matrix <code>data</code>
+          can be computed using
+          <div class="source"><pre>
+new PearsonsCorrelation().computeCorrelationMatrix(data)
+          </pre>
+          The i-jth entry of the returned matrix is the Pearson's product-moment correlation between the
+          ith and jth columns of <code>data.</code></dd>
+<br />
+</br><dt><strong>Pearson's correlation significance and standard errors</strong></dt>
+<br />
+</br><dd> To compute standard errors and/or significances of correlation coefficients
+          associated with Pearson's correlation coefficients, start by creating a
+          <code>PearsonsCorrelation</code> instance
+          <div class="source"><pre>
+PearsonsCorrelation correlation = new PearsonsCorrelation(data);
+          </pre>
+          where <code>data</code> is either a rectangular array or a <code>RealMatrix.</code>
+          Then the matrix of standard errors is
+          <div class="source"><pre>
+          </pre>
+          The formula used to compute the standard error is <br />
+<code>SE<sub>r</sub> = ((1 - r<sup>2</sup>) / (n - 2))<sup>1/2</sup></code><br />
+           where <code>r</code> is the estimated correlation coefficient and 
+          <code>n</code> is the number of observations in the source dataset.<br />
+<br />
+<strong>p-values</strong> for the (2-sided) null hypotheses that elements of
+          a correlation matrix are zero populate the RealMatrix returned by
+          <div class="source"><pre>
+          </pre>
+<code>getCorrelationPValues().getEntry(i,j)</code> is the
+          probability that a random variable distributed as <code>t<sub>n-2</sub></code> takes
+           a value with absolute value greater than or equal to <br />
+</br><code>|r<sub>ij</sub>|((n - 2) / (1 - r<sub>ij</sub><sup>2</sup>))<sup>1/2</sup></code>,
+           where <code>r<sub>ij</sub></code> is the estimated correlation between the ith and jth
+           columns of the source array or RealMatrix. This is sometimes referred to as the 
+           <i>significance</i> of the coefficient.<br />
+<br />
+           For example, if <code>data</code> is a RealMatrix with 2 columns and 10 rows, then 
+           <div class="source"><pre>
+new PearsonsCorrelation(data).getCorrelationPValues().getEntry(0,1)
+           </pre>
+           is the significance of the Pearson's correlation coefficient between the two columns
+           of <code>data</code>.  If this value is less than .01, we can say that the correlation
+           between the two columns of data is significant at the 99% level.
+          </dd>
+<br />
+</br><dt><strong>Spearman's rank correlation coefficient</strong></dt>
+<br />
+</br><dd>To compute the Spearman's rank-moment correlation between two double arrays
+          <code>x</code> and <code>y</code>:
+          <div class="source"><pre>
+new SpearmansCorrelation().correlation(x, y)
+          </pre>
+          This is equivalent to 
+          <div class="source"><pre>
+RankingAlgorithm ranking = new NaturalRanking();
+new PearsonsCorrelation().correlation(ranking.rank(x), ranking.rank(y))
+          </pre>
+<br />
+<div class="section"><h3><a name="a1.8_Statistical_tests"></a>1.8 Statistical tests</h3>
+          The interfaces and implementations in the
+          <a href="../apidocs/org/apache/commons/math/stat/inference/">
+          org.apache.commons.math.stat.inference</a> package provide
+          <a href="http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm" class="externalLink">
+          Student's t</a>,
+          <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm" class="externalLink">
+          Chi-Square</a> and 
+          <a href="http://www.itl.nist.gov/div898/handbook/prc/section4/prc43.htm" class="externalLink">
+          One-Way ANOVA</a> test statistics as well as
+          <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue" class="externalLink">
+          p-values</a> associated with <code>t-</code>,
+          <code>Chi-Square</code> and <code>One-Way ANOVA</code> tests.  The
+          interfaces are
+          <a href="../apidocs/org/apache/commons/math/stat/inference/TTest.html">
+          TTest</a>,
+          <a href="../apidocs/org/apache/commons/math/stat/inference/ChiSquareTest.html">
+          ChiSquareTest</a>, and
+          <a href="../apidocs/org/apache/commons/math/stat/inference/OneWayAnova.html">
+          OneWayAnova</a> with provided implementations
+          <a href="../apidocs/org/apache/commons/math/stat/inference/TTestImpl.html">
+          TTestImpl</a>,
+          <a href="../apidocs/org/apache/commons/math/stat/inference/ChiSquareTestImpl.html">
+          ChiSquareTestImpl</a> and
+          <a href="../apidocs/org/apache/commons/math/stat/inference/OneWayAnovaImpl.html">
+          OneWayAnovaImpl</a>, respectively.
+          The
+          <a href="../apidocs/org/apache/commons/math/stat/inference/TestUtils.html">
+          TestUtils</a> class provides static methods to get test instances or
+          to compute test statistics directly.  The examples below all use the
+          static methods in <code>TestUtils</code> to execute tests.  To get
+          test object instances, either use e.g.,
+          <code>TestUtils.getTTest()</code> or use the implementation constructors
+          directly, e.g.,
+          <code>new TTestImpl()</code>.
+        </p>
+<p><strong>Implementation Notes</strong><ul><li>Both one- and two-sample t-tests are supported.  Two sample tests
+          can be either paired or unpaired and the unpaired two-sample tests can
+          be conducted under the assumption of equal subpopulation variances or
+          without this assumption.  When equal variances is assumed, a pooled
+          variance estimate is used to compute the t-statistic and the degrees
+          of freedom used in the t-test equals the sum of the sample sizes minus 2.
+          When equal variances is not assumed, the t-statistic uses both sample
+          variances and the
+          <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/gifs/nu3.gif" class="externalLink">
+          Welch-Satterwaite approximation</a> is used to compute the degrees
+          of freedom.  Methods to return t-statistics and p-values are provided in each
+          case, as well as boolean-valued methods to perform fixed significance
+          level tests.  The names of methods or methods that assume equal
+          subpopulation variances always start with &quot;homoscedastic.&quot;  Test or
+          test-statistic methods that just start with &quot;t&quot; do not assume equal
+          variances. See the examples below and the API documentation for
+          more details.</li>
+<li>The validity of the p-values returned by the t-test depends on the
+          assumptions of the parametric t-test procedure, as discussed
+          <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html" class="externalLink">
+          here</a></li>
+<li>p-values returned by t-, chi-square and Anova tests are exact, based
+           on numerical approximations to the t-, chi-square and F distributions in the
+           <code>distributions</code> package. </li>
+<li>p-values returned by t-tests are for two-sided tests and the boolean-valued
+           methods supporting fixed significance level tests assume that the hypotheses
+           are two-sided.  One sided tests can be performed by dividing returned p-values
+           (resp. critical values) by 2.</li>
+<li>Degrees of freedom for chi-square tests are integral values, based on the
+           number of observed or expected counts (number of observed counts - 1)
+           for the goodness-of-fit tests and (number of columns -1) * (number of rows - 1)
+           for independence tests.</li>
+<p><strong>Examples:</strong><dl><dt><strong>One-sample <code>t</code> tests</strong></dt>
+<br />
+</br><dd>To compare the mean of a double[] array to a fixed value:
+          <div class="source"><pre>
+double[] observed = {1d, 2d, 3d};
+double mu = 2.5d;
+System.out.println(TestUtils.t(mu, observed));
+          </pre>
+          The code above will display the t-statisitic associated with a one-sample
+           t-test comparing the mean of the <code>observed</code> values against
+           <code>mu.</code></dd>
+<dd>To compare the mean of a dataset described by a
+          <a href="../apidocs/org/apache/commons/math/stat/descriptive/StatisticalSummary.html">
+          org.apache.commons.math.stat.descriptive.StatisticalSummary</a>  to a fixed value:
+          <div class="source"><pre>
+double[] observed ={1d, 2d, 3d};
+double mu = 2.5d;
+SummaryStatistics sampleStats = new SummaryStatistics();
+for (int i = 0; i &lt; observed.length; i++) {
+    sampleStats.addValue(observed[i]);
+System.out.println(TestUtils.t(mu, observed));
+<dd>To compute the p-value associated with the null hypothesis that the mean
+            of a set of values equals a point estimate, against the two-sided alternative that
+            the mean is different from the target value:
+            <div class="source"><pre>
+double[] observed = {1d, 2d, 3d};
+double mu = 2.5d;
+System.out.println(TestUtils.tTest(mu, observed));
+           </pre>
+          The snippet above will display the p-value associated with the null
+          hypothesis that the mean of the population from which the
+          <code>observed</code> values are drawn equals <code>mu.</code></dd>
+<dd>To perform the test using a fixed significance level, use:
+          <div class="source"><pre>
+TestUtils.tTest(mu, observed, alpha);
+          </pre>
+          where <code>0 &lt; alpha &lt; 0.5</code> is the significance level of
+          the test.  The boolean value returned will be <code>true</code> iff the
+          null hypothesis can be rejected with confidence <code>1 - alpha</code>.
+          To test, for example at the 95% level of confidence, use
+          <code>alpha = 0.05</code></dd>
+<br />
+</br><dt><strong>Two-Sample t-tests</strong></dt>
+<br />
+</br><dd><strong>Example 1:</strong> Paired test evaluating
+          the null hypothesis that the mean difference between corresponding
+          (paired) elements of the <code>double[]</code> arrays
+          <code>sample1</code> and <code>sample2</code> is zero.
+          To compute the t-statistic:
+          <div class="source"><pre>
+TestUtils.pairedT(sample1, sample2);
+          </pre>
+           To compute the p-value:
+           <div class="source"><pre>
+TestUtils.pairedTTest(sample1, sample2);
+           </pre>
+           To perform a fixed significance level test with alpha = .05:
+           <div class="source"><pre>
+TestUtils.pairedTTest(sample1, sample2, .05);
+           </pre>
+           The last example will return <code>true</code> iff the p-value
+           returned by <code>TestUtils.pairedTTest(sample1, sample2)</code>
+           is less than <code>.05</code></dd>
+<dd><strong>Example 2: </strong> unpaired, two-sided, two-sample t-test using
+           <code>StatisticalSummary</code> instances, without assuming that
+           subpopulation variances are equal.
+           First create the <code>StatisticalSummary</code> instances.  Both
+           <code>DescriptiveStatistics</code> and <code>SummaryStatistics</code>
+           implement this interface.  Assume that <code>summary1</code> and
+           <code>summary2</code> are <code>SummaryStatistics</code> instances,
+           each of which has had at least 2 values added to the (virtual) dataset that
+           it describes.  The sample sizes do not have to be the same -- all that is required
+           is that both samples have at least 2 elements.
+           <p><strong>Note:</strong> The <code>SummaryStatistics</code> class does
+           not store the dataset that it describes in memory, but it does compute all
+           statistics necessary to perform t-tests, so this method can be used to
+           conduct t-tests with very large samples.  One-sample tests can also be
+           performed this way.
+           (See <a href="#1.2 Descriptive statistics">Descriptive statistics</a> for details
+           on the <code>SummaryStatistics</code> class.)
+           </p>
+          To compute the t-statistic:
+          <div class="source"><pre>
+TestUtils.t(summary1, summary2);
+          </pre>
+           To compute the p-value:
+           <div class="source"><pre>
+TestUtils.tTest(sample1, sample2);
+           </pre>
+           To perform a fixed significance level test with alpha = .05:
+           <div class="source"><pre>
+TestUtils.tTest(sample1, sample2, .05);
+           </pre>
+           In each case above, the test does not assume that the subpopulation
+           variances are equal.  To perform the tests under this assumption,
+           replace &quot;t&quot; at the beginning of the method name with &quot;homoscedasticT&quot;
+           </p>
+<br />
+</br><dt><strong>Chi-square tests</strong></dt>
+<br />
+</br><dd>To compute a chi-square statistic measuring the agreement between a
+          <code>long[]</code> array of observed counts and a <code>double[]</code>
+          array of expected counts, use:
+          <div class="source"><pre>
+long[] observed = {10, 9, 11};
+double[] expected = {10.1, 9.8, 10.3};
+System.out.println(TestUtils.chiSquare(expected, observed));
+          </pre>
+          the value displayed will be
+          <code>sum((expected[i] - observed[i])^2 / expected[i])</code></dd>
+<dd> To get the p-value associated with the null hypothesis that
+          <code>observed</code> conforms to <code>expected</code> use:
+          <div class="source"><pre>
+TestUtils.chiSquareTest(expected, observed);
+          </pre>
+<dd> To test the null hypothesis that <code>observed</code> conforms to
+          <code>expected</code> with <code>alpha</code> siginficance level
+          (equiv. <code>100 * (1-alpha)%</code> confidence) where <code>
+          0 &lt; alpha &lt; 1 </code> use:
+          <div class="source"><pre>
+TestUtils.chiSquareTest(expected, observed, alpha);
+          </pre>
+          The boolean value returned will be <code>true</code> iff the null hypothesis
+          can be rejected with confidence <code>1 - alpha</code>.
+          </dd>
+<dd>To compute a chi-square statistic statistic associated with a
+          <a href="http://www.itl.nist.gov/div898/handbook/prc/section4/prc45.htm" class="externalLink">
+          chi-square test of independence</a> based on a two-dimensional (long[][])
+          <code>counts</code> array viewed as a two-way table, use:
+          <div class="source"><pre>
+          </pre>
+          The rows of the 2-way table are
+          <code>count[0], ... , count[count.length - 1]. </code><br />
+          The chi-square statistic returned is
+          <code>sum((counts[i][j] - expected[i][j])^2/expected[i][j])</code>
+          where the sum is taken over all table entries and
+          <code>expected[i][j]</code> is the product of the row and column sums at
+          row <code>i</code>, column <code>j</code> divided by the total count.
+          </dd>
+<dd>To compute the p-value associated with the null hypothesis that
+          the classifications represented by the counts in the columns of the input 2-way
+          table are independent of the rows, use:
+          <div class="source"><pre>
+ TestUtils.chiSquareTest(counts);
+          </pre>
+<dd>To perform a chi-square test of independence with <code>alpha</code>
+          siginficance level (equiv. <code>100 * (1-alpha)%</code> confidence)
+          where <code>0 &lt; alpha &lt; 1 </code> use:
+          <div class="source"><pre>
+TestUtils.chiSquareTest(counts, alpha);
+          </pre>
+          The boolean value returned will be <code>true</code> iff the null
+          hypothesis can be rejected with confidence <code>1 - alpha</code>.
+          </dd>
+<br />
+</br><dt><strong>One-Way Anova tests</strong></dt>
+<br />
+</br><dd>To conduct a One-Way Analysis of Variance (ANOVA) to evaluate the
+          null hypothesis that the means of a collection of univariate datasets
+          are the same, start by loading the datasets into a collection, e.g.
+          <div class="source"><pre>
+double[] classA =
+   {93.0, 103.0, 95.0, 101.0, 91.0, 105.0, 96.0, 94.0, 101.0 };
+double[] classB =
+   {99.0, 92.0, 102.0, 100.0, 102.0, 89.0 };
+double[] classC =
+   {110.0, 115.0, 111.0, 117.0, 128.0, 117.0 };
+List classes = new ArrayList();
+          </pre>
+          Then you can compute ANOVA F- or p-values associated with the
+          null hypothesis that the class means are all the same
+          using a <code>OneWayAnova</code> instance or <code>TestUtils</code>
+          methods:
+          <div class="source"><pre>
+double fStatistic = TestUtils.oneWayAnovaFValue(classes); // F-value
+double pValue = TestUtils.oneWayAnovaPValue(classes);     // P-value
+          </pre>
+          To test perform a One-Way Anova test with signficance level set at 0.01
+          (so the test will, assuming assumptions are met, reject the null
+          hypothesis incorrectly only about one in 100 times), use
+          <div class="source"><pre>
+TestUtils.oneWayAnovaTest(classes, 0.01); // returns a boolean
+                                          // true means reject null hypothesis
+          </pre>
+      </div>
+    </div>
+    <div class="clear">
+      <hr/>
+    </div>
+    <div id="footer">
+      <div class="xright">&#169;  
+          2003-2010
+  </div>
+      <div class="clear">
+        <hr/>
+      </div>
+    </div>
+  </body>