view libs/commons-math-2.1/docs/userguide/random.html @ 10:5f2c5fb36e93

commons-math-2.1 added
author dwinter
date Tue, 04 Jan 2011 10:00:53 +0100
parents
children
line wrap: on
line source

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">











<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Math - The Commons Math User Guide - Data Generation</title>
    <style type="text/css" media="all">
      @import url("../css/maven-base.css");
      @import url("../css/maven-theme.css");
      @import url("../css/site.css");
    </style>
    <link rel="stylesheet" href="../css/print.css" type="text/css" media="print" />
        <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
      </head>
  <body class="composite">
    <div id="banner">
                    <span id="bannerLeft">
    
            Commons Math User Guide
    
            </span>
                    <div class="clear">
        <hr/>
      </div>
    </div>
    <div id="breadcrumbs">
          
  

  
    
  
  
    
              <div class="xright">      
  

  
    
  
  
    
  </div>
      <div class="clear">
        <hr/>
      </div>
    </div>
    <div id="leftColumn">
      <div id="navcolumn">
           
  

  
    
  
  
    
                   <h5>User Guide</h5>
            <ul>
              
    <li class="none">
                    <a href="../userguide/index.html">Contents</a>
          </li>
              
    <li class="none">
                    <a href="../userguide/overview.html">Overview</a>
          </li>
              
    <li class="none">
                    <a href="../userguide/stat.html">Statistics</a>
          </li>
              
    <li class="none">
              <strong>Data Generation</strong>
        </li>
              
    <li class="none">
                    <a href="../userguide/linear.html">Linear Algebra</a>
          </li>
              
    <li class="none">
                    <a href="../userguide/analysis.html">Numerical Analysis</a>
          </li>
              
    <li class="none">
                    <a href="../userguide/special.html">Special Functions</a>
          </li>
              
    <li class="none">
                    <a href="../userguide/utilities.html">Utilities</a>
          </li>
              
    <li class="none">
                    <a href="../userguide/complex.html">Complex Numbers</a>
          </li>
              
    <li class="none">
                    <a href="../userguide/distribution.html">Distributions</a>
          </li>
              
    <li class="none">
                    <a href="../userguide/fraction.html">Fractions</a>
          </li>
              
    <li class="none">
                    <a href="../userguide/transform.html">Transform Methods</a>
          </li>
              
    <li class="none">
                    <a href="../userguide/geometry.html">3D Geometry</a>
          </li>
              
    <li class="none">
                    <a href="../userguide/optimization.html">Optimization</a>
          </li>
              
    <li class="none">
                    <a href="../userguide/ode.html">Ordinary Differential Equations</a>
          </li>
              
    <li class="none">
                    <a href="../userguide/genetics.html">Genetic Algorithms</a>
          </li>
          </ul>
                                           <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
            <img alt="Built by Maven" src="../images/logos/maven-feather.png"></img>
          </a>
                       
  

  
    
  
  
    
        </div>
    </div>
    <div id="bodyColumn">
      <div id="contentBox">
        <div class="section"><h2><a name="a2_Data_Generation"></a>2 Data Generation</h2>
<div class="section"><h3><a name="a2.1_Overview"></a>2.1 Overview</h3>
<p>
    The Commons Math random package includes utilities for
    <ul><li>generating random numbers</li>
<li>generating random strings</li>
<li>generating cryptographically secure sequences of random numbers or
         strings</li>
<li>generating random samples and permutations</li>
<li>analyzing distributions of values in an input file and generating
         values &quot;like&quot; the values in the file</li>
<li>generating data for grouped frequency distributions or
         histograms</li>
</ul>
</p>
<p>
     The source of random data used by the data generation utilities is
     pluggable.  By default, the JDK-supplied PseudoRandom Number Generator
     (PRNG) is used, but alternative generators can be &quot;plugged in&quot; using an
     adaptor framework, which provides a generic facility for replacing 
     <code>java.util.Random</code> with an alternative PRNG. Another very
     good PRNG suitable for Monte-Carlo analysis (but <strong>not</strong>
     for cryptography) provided by the library is the Mersenne twister from
     Makoto Matsumoto and Takuji Nishimura
    </p>
<p>
     Sections 2.2-2.6 below show how to use the commons math API to generate
     different kinds of random data.  The examples all use the default
     JDK-supplied PRNG.  PRNG pluggability is covered in 2.7.  The only 
     modification required to the examples to use alternative PRNGs is to
     replace the argumentless constructor calls with invocations including
     a <code>RandomGenerator</code> instance as a parameter.
    </p>
</div>
<div class="section"><h3><a name="a2.2_Random_numbers"></a>2.2 Random numbers</h3>
<p>
    The <a href="../apidocs/org/apache/commons/math/random/RandomData.html">
    org.apache.commons.math.RandomData</a> interface defines methods for
    generating random sequences of numbers. The API contracts of these methods
    use the following concepts:
    <dl><dt>Random sequence of numbers from a probability distribution</dt>
<dd>There is no such thing as a single &quot;random number.&quot;  What can be
    generated  are <i>sequences</i> of numbers that appear to be random.  When
    using the built-in JDK function <code>Math.random(),</code> sequences of 
    values generated follow the 
    <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm" class="externalLink">
    Uniform Distribution</a>, which means that the values are evenly spread
    over the interval  between 0 and 1, with no sub-interval having a greater
    probability of containing generated values than any other interval of the
    same length.  The mathematical concept of a
    <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda36.htm" class="externalLink">
    probability distribution</a> basically amounts to asserting that different
    ranges in the set  of possible values of a random variable have
    different probabilities of containing the value.  Commons Math supports
    generating random sequences from the following probability distributions.
    The javadoc for the <code>nextXxx</code> methods in 
    <code>RandomDataImpl</code> describes the algorithms used to generate
    random deviates from each of these distributions. 
    <ul><li><a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm" class="externalLink">
     uniform distribution</a></li>
<li><a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3667.htm" class="externalLink">
     exponential distribution</a></li>
<li><a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda366j.htm" class="externalLink">
    poisson distribution</a></li>
<li><a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3661.htm" class="externalLink">
    Gaussian distribution</a></li>
</ul>
</dd>
<dt>Cryptographically secure random sequences</dt>
<dd>It is possible for a sequence of numbers to appear random, but
    nonetheless to be predictable based on the algorithm used to generate the
    sequence. If in addition to randomness, strong unpredictability is
    required, it is best to use a  
    <a href="http://www.wikipedia.org/wiki/Cryptographically_secure_pseudo-random_number_generator" class="externalLink">
    secure random number generator</a> to generate values (or strings). The
    nextSecureXxx methods in the <code>RandomDataImpl</code> implementation of
    the <code>RandomData</code> interface use the JDK <code>SecureRandom</code>
    PRNG to generate cryptographically secure sequences.  The 
    <code>setSecureAlgorithm</code> method allows you to change the underlying
    PRNG. These methods are <strong>much slower</strong> than the corresponding
    &quot;non-secure&quot; versions, so they should only be used when cryptographic
    security is required.</dd>
<dt>Seeding pseudo-random number generators</dt>
<dd>By default, the implementation provided in <code>RandomDataImpl</code>
    uses the JDK-provided PRNG.  Like most other PRNGs, the JDK generator
    generates sequences of random numbers based on an initial &quot;seed value&quot;.
    For the non-secure methods, starting with the same seed always produces the
    same sequence of values.  Secure sequences started with the same seeds will
    diverge. When a new <code>RandomDataImpl</code> is created, the underlying
    random number generators are <strong>not</strong> initialized.  The first
    call to a data generation method, or to a  <code>reSeed()</code> method
    initializes the appropriate generator.  If you do not explicitly seed the
    generator, it is by default seeded with the current time in milliseconds.
    Therefore, to generate sequences of random data values, you should always
    instantiate <strong>one</strong><code>RandomDataImpl</code> and use it
    repeatedly instead of creating new instances for subsequent values in the
    sequence.  For example, the following will generate a random sequence of 50
    long integers between 1 and 1,000,000, using the current time in
    milliseconds as the seed for the JDK PRNG:
    <div class="source"><pre>
RandomData randomData = new RandomDataImpl(); 
for (int i = 0; i &lt; 1000; i++) {
    value = randomData.nextLong(1, 1000000);
}
    </pre>
</div>

    The following will not in general produce a good random sequence, since the
    PRNG is reseeded each time through the loop with the current time in
    milliseconds:
    <div class="source"><pre>
for (int i = 0; i &lt; 1000; i++) {
    RandomDataImpl randomData = new RandomDataImpl(); 
    value = randomData.nextLong(1, 1000000);
}
    </pre>
</div>

    The following will produce the same random sequence each time it is
    executed:
    <div class="source"><pre>
RandomData randomData = new RandomDataImpl(); 
randomData.reSeed(1000);
for (int i = 0; i = 1000; i++) {
    value = randomData.nextLong(1, 1000000);
}
    </pre>
</div>

    The following will produce a different random sequence each time it is
     executed. 
    <div class="source"><pre>
RandomData randomData = new RandomDataImpl(); 
randomData.reSeedSecure(1000);
for (int i = 0; i &lt; 1000; i++) {
    value = randomData.nextSecureLong(1, 1000000);
}
    </pre>
</div>
</dd>
</dl>
</p>
</div>
<div class="section"><h3><a name="a2.3_Random_Vectors"></a>2.3 Random Vectors</h3>
<p>
    Some algorithm requires random vectors instead of random scalars. When the
    components of these vectors are uncorrelated, they may be generated simply
    one at a time and packed together in the vector. The <a href="../apidocs/org/apache/commons/math/random/UncorrelatedRandomVectorGenerator.html">
    org.apache.commons.math.UncorrelatedRandomVectorGenerator</a> class
    does however simplify this process by setting the mean and deviation of each
    component once and generating complete vectors. When the components are correlated
    however, generating them is much more difficult. The <a href="../apidocs/org/apache/commons/math/random/CorrelatedRandomVectorGenerator.html">
    org.apache.commons.math.CorrelatedRandomVectorGenerator</a> class
    provides this service. In this case, the user must set up a complete covariance matrix
    instead of a simple standard deviations vector. This matrix gathers both the variance
    and the correlation information of the probability law.
    </p>
<p>
    The main use for correlated random vector generation is for Monte-Carlo
    simulation of physical problems with several variables, for example to
    generate error vectors to be added to a nominal vector. A particularly
    common case is when the generated vector should be drawn from a <a href="http://en.wikipedia.org/wiki/Multivariate_normal_distribution" class="externalLink">
    Multivariate Normal Distribution</a>.
    </p>
</div>
<div class="section"><h3><a name="a2.4_Random_Strings"></a>2.4 Random Strings</h3>
<p>
    The methods <code>nextHexString</code> and <code>nextSecureHexString</code>
    can be used to generate random strings of hexadecimal characters.  Both
    of these methods produce sequences of strings with good dispersion
    properties.  The difference between the two methods is that the second is
    cryptographically secure.  Specifically, the implementation of 
    <code>nextHexString(n)</code> in <code>RandomDataImpl</code> uses the
    following simple algorithm to generate a string of <code>n</code> hex digits:
    <ol type="1"><li>n/2+1 binary bytes are generated using the underlying Random</li>
<li>Each binary byte is translated into 2 hex digits</li>
</ol>

    The <code>RandomDataImpl</code> implementation of the &quot;secure&quot; version, 
    <code>nextSecureHexString</code> generates hex characters in 40-byte
    &quot;chunks&quot; using a 3-step process:
    <ol type="1"><li>20 random bytes are generated using the underlying 
    <code>SecureRandom.</code></li>
<li>SHA-1 hash is applied to yield a 20-byte binary digest.</li>
<li>Each byte of the binary digest is converted to 2 hex digits</li>
</ol>

    Similarly to the secure random number generation methods, 
    <code>nextSecureHexString</code> is <strong>much slower</strong> than
    the non-secure version.  It should be used only for applications such as 
    generating unique session or transaction ids where predictability of
    subsequent ids based on observation of previous values is a security
    concern.  If all that is needed is an even distribution of hex characters
    in the generated strings, the non-secure method should be used.        
    </p>
</div>
<div class="section"><h3><a name="a2.5_Random_permutations_combinations_sampling"></a>2.5 Random permutations, combinations, sampling</h3>
<p>
    To select a random sample of objects in a collection, you can use the
    <code>nextSample</code> method in the <code>RandomData</code> interface.
    Specifically,  if <code>c</code> is a collection containing at least 
    <code>k</code> objects, and <code>randomData</code> is a 
    <code>RandomData</code> instance <code>randomData.nextSample(c, k)</code>
    will return an <code>object[]</code> array of length <code>k</code>
    consisting of elements randomly selected from the collection.  If 
    <code>c</code> contains duplicate references, there may be duplicate
    references in the returned array; otherwise returned elements will be
    unique -- i.e., the sampling is without replacement among the object
    references in the collection. </p>
<p>
    If <code>randomData</code> is a <code>RandomData</code> instance, and 
    <code>n</code> and <code>k</code> are integers with 
    <code> k &lt;= n</code>,  then 
    <code>randomData.nextPermutation(n, k)</code> returns an <code>int[]</code>
    array of length <code>k</code> whose whose entries are selected randomly, 
    without repetition, from the integers <code>0</code> through
    <code>n-1</code> (inclusive), i.e., 
    <code>randomData.nextPermutation(n, k)</code> returns a random
    permutation of  <code>n</code> taken <code>k</code> at a time.   
    </p>
</div>
<div class="section"><h3><a name="a2.6_Generating_data_like_an_input_file"></a>2.6 Generating data 'like' an input file</h3>
<p>
    Using the <code>ValueServer</code> class, you can generate data based on
    the values in an input file in one of two ways:
    <dl><dt>Replay Mode</dt>
<dd> The following code will read data from <code>url</code> 
      (a <code>java.net.URL</code> instance), cycling through the values in the 
      file in sequence, reopening and starting at the beginning again when all 
      values have been read.
      <div class="source"><pre>
      ValueServer vs = new ValueServer();
      vs.setValuesFileURL(url); 
      vs.setMode(ValueServer.REPLAY_MODE);
      vs.resetReplayFile();
      double value = vs.getNext();
      // ...Generate and use more values...
      vs.closeReplayFile();
      </pre>
</div>

      The values in the file are not stored in memory, so it does not matter
      how large the file is, but you do need to explicitly close the file
      as above.  The expected file format is \n -delimited (i.e. one per line)
      strings representing valid floating point numbers.
      </dd>
<dt>Digest Mode</dt>
<dd>When used in Digest Mode, the ValueServer reads the entire input file
      and estimates a probability density function based on data from the file.
      The estimation method is essentially the 
      <a href="http://nedwww.ipac.caltech.edu/level5/March02/Silverman/Silver2_6.html" class="externalLink">
      Variable Kernel Method</a> with Gaussian smoothing.  Once the density
      has been estimated, <code>getNext()</code> returns random values whose
      probability distribution matches the empirical distribution -- i.e., if
      you generate a large number of such values, their distribution should
      &quot;look like&quot; the distribution of the values in the input file.  The values
      are not stored in memory in this case either, so there is no limit to the
      size of the input file.  Here is an example:
      <div class="source"><pre>
      ValueServer vs = new ValueServer();
      vs.setValuesFileURL(url); 
      vs.setMode(ValueServer.DIGEST_MODE);
      vs.computeDistribution(500); //Read file and estimate distribution using 500 bins
      double value = vs.getNext();
      // ...Generate and use more values...
      </pre>
</div>

      See the javadoc for <code>ValueServer</code> and 
      <code>EmpiricalDistribution</code> for more details.  Note that 
      <code>computeDistribution()</code> opens and closes the input file
       by itself. 
      </dd>
</dl>
</p>
</div>
<div class="section"><h3><a name="a2.7_PRNG_Pluggability"></a>2.7 PRNG Pluggability</h3>
<p>
      To enable alternative PRNGs to be &quot;plugged in&quot; to the commons-math data
      generation utilities and to provide a generic means to replace 
      <code>java.util.Random</code> in applications, a random generator 
      adaptor framework has been added to commons-math.  The
      <a href="../apidocs/org/apache/commons/math/random/RandomGenerator.html">
      org.apache.commons.math.RandomGenerator</a> interface abstracts the public
      interface of <code>java.util.Random</code> and any implementation of this
      interface can be used as the source of random data for the commons-math 
      data generation classes.  An abstract base class, 
      <a href="../apidocs/org/apache/commons/math/random/AbstractRandomGenerator.html">
      org.apache.commons.math.AbstractRandomGenerator</a> is provided to make
      implementation easier.  This class provides default implementations of
      &quot;derived&quot; data generation methods based on the primitive, 
      <code>nextDouble().</code>  To support generic replacement of 
      <code>java.util.Random</code>,  the 
      <a href="../apidocs/org/apache/commons/math/random/RandomAdaptor.html">
      org.apache.commons.math.RandomAdaptor</a> class is provided,  which
      extends <code>java.util.Random</code> and wraps and delegates calls to
      a <code>RandomGenerator</code> instance.   
  </p>
<p>
     Examples:
     <dl><dt>Create a RandomGenerator based on RngPack's Mersenne Twister</dt>
<dd>To create a RandomGenerator using the RngPack Mersenne Twister PRNG
       as the source of randomness, extend <code>AbstractRandomGenerator</code>
       overriding the derived methods that the RngPack implementation provides:
       <div class="source"><pre>
import edu.cornell.lassp.houle.RngPack.RanMT;
/**
 * AbstractRandomGenerator based on RngPack RanMT generator.
 */
public class RngPackGenerator extends AbstractRandomGenerator {
    
    private RanMT random = new RanMT();
    
    public void setSeed(long seed) {
       random = new RanMT(seed);
    }
    
    public double nextDouble() {
        return random.raw();
    }
    
    public double nextGaussian() {
        return random.gaussian();
    }
    
    public int nextInt(int n) {
        return random.choose(n);
    }
    
    public boolean nextBoolean() {
        return random.coin();
    }
}
      </pre>
</div>
</dd>
<dt>Use the Mersenne Twister RandomGenerator in place of 
      <code>java.util.Random</code> in <code>RandomData</code></dt>
<dd><div class="source"><pre>
RandomData randomData = new RandomDataImpl(new RngPackGenerator());
      </pre>
</div>
</dd>
<dt>Create an adaptor instance based on the Mersenne Twister generator
      that can be used in place of a <code>Random</code></dt>
<dd><div class="source"><pre>
 RandomGenerator generator = new RngPackGenerator();
 Random random = RandomAdaptor.createAdaptor(generator);
 // random can now be used in place of a Random instance, data generation
 // calls will be delegated to the wrapped Mersenne Twister
      </pre>
</div>
</dd>
</dl>
</p>
</div>
</div>

      </div>
    </div>
    <div class="clear">
      <hr/>
    </div>
    <div id="footer">
      <div class="xright">&#169;  
          2003-2010
    
          
  

  
    
  
  
    
  </div>
      <div class="clear">
        <hr/>
      </div>
    </div>
  </body>
</html>