Mercurial > hg > de.mpg.mpiwg.itgroup.digilib.core
diff libs/commons-math-2.1/docs/userguide/random.html @ 10:5f2c5fb36e93
commons-math-2.1 added
author | dwinter |
---|---|
date | Tue, 04 Jan 2011 10:00:53 +0100 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/libs/commons-math-2.1/docs/userguide/random.html Tue Jan 04 10:00:53 2011 +0100 @@ -0,0 +1,530 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> + + + + + + + + + + + +<html xmlns="http://www.w3.org/1999/xhtml"> + <head> + <title>Math - The Commons Math User Guide - Data Generation</title> + <style type="text/css" media="all"> + @import url("../css/maven-base.css"); + @import url("../css/maven-theme.css"); + @import url("../css/site.css"); + </style> + <link rel="stylesheet" href="../css/print.css" type="text/css" media="print" /> + <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> + </head> + <body class="composite"> + <div id="banner"> + <span id="bannerLeft"> + + Commons Math User Guide + + </span> + <div class="clear"> + <hr/> + </div> + </div> + <div id="breadcrumbs"> + + + + + + + + + <div class="xright"> + + + + + + + + </div> + <div class="clear"> + <hr/> + </div> + </div> + <div id="leftColumn"> + <div id="navcolumn"> + + + + + + + + + <h5>User Guide</h5> + <ul> + + <li class="none"> + <a href="../userguide/index.html">Contents</a> + </li> + + <li class="none"> + <a href="../userguide/overview.html">Overview</a> + </li> + + <li class="none"> + <a href="../userguide/stat.html">Statistics</a> + </li> + + <li class="none"> + <strong>Data Generation</strong> + </li> + + <li class="none"> + <a href="../userguide/linear.html">Linear Algebra</a> + </li> + + <li class="none"> + <a href="../userguide/analysis.html">Numerical Analysis</a> + </li> + + <li class="none"> + <a href="../userguide/special.html">Special Functions</a> + </li> + + <li class="none"> + <a href="../userguide/utilities.html">Utilities</a> + </li> + + <li class="none"> + <a href="../userguide/complex.html">Complex Numbers</a> + </li> + + <li class="none"> + <a href="../userguide/distribution.html">Distributions</a> + </li> + + <li class="none"> + <a href="../userguide/fraction.html">Fractions</a> + </li> + + <li class="none"> + <a href="../userguide/transform.html">Transform Methods</a> + </li> + + <li class="none"> + <a href="../userguide/geometry.html">3D Geometry</a> + </li> + + <li class="none"> + <a href="../userguide/optimization.html">Optimization</a> + </li> + + <li class="none"> + <a href="../userguide/ode.html">Ordinary Differential Equations</a> + </li> + + <li class="none"> + <a href="../userguide/genetics.html">Genetic Algorithms</a> + </li> + </ul> + <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"> + <img alt="Built by Maven" src="../images/logos/maven-feather.png"></img> + </a> + + + + + + + + + </div> + </div> + <div id="bodyColumn"> + <div id="contentBox"> + <div class="section"><h2><a name="a2_Data_Generation"></a>2 Data Generation</h2> +<div class="section"><h3><a name="a2.1_Overview"></a>2.1 Overview</h3> +<p> + The Commons Math random package includes utilities for + <ul><li>generating random numbers</li> +<li>generating random strings</li> +<li>generating cryptographically secure sequences of random numbers or + strings</li> +<li>generating random samples and permutations</li> +<li>analyzing distributions of values in an input file and generating + values "like" the values in the file</li> +<li>generating data for grouped frequency distributions or + histograms</li> +</ul> +</p> +<p> + The source of random data used by the data generation utilities is + pluggable. By default, the JDK-supplied PseudoRandom Number Generator + (PRNG) is used, but alternative generators can be "plugged in" using an + adaptor framework, which provides a generic facility for replacing + <code>java.util.Random</code> with an alternative PRNG. Another very + good PRNG suitable for Monte-Carlo analysis (but <strong>not</strong> + for cryptography) provided by the library is the Mersenne twister from + Makoto Matsumoto and Takuji Nishimura + </p> +<p> + Sections 2.2-2.6 below show how to use the commons math API to generate + different kinds of random data. The examples all use the default + JDK-supplied PRNG. PRNG pluggability is covered in 2.7. The only + modification required to the examples to use alternative PRNGs is to + replace the argumentless constructor calls with invocations including + a <code>RandomGenerator</code> instance as a parameter. + </p> +</div> +<div class="section"><h3><a name="a2.2_Random_numbers"></a>2.2 Random numbers</h3> +<p> + The <a href="../apidocs/org/apache/commons/math/random/RandomData.html"> + org.apache.commons.math.RandomData</a> interface defines methods for + generating random sequences of numbers. The API contracts of these methods + use the following concepts: + <dl><dt>Random sequence of numbers from a probability distribution</dt> +<dd>There is no such thing as a single "random number." What can be + generated are <i>sequences</i> of numbers that appear to be random. When + using the built-in JDK function <code>Math.random(),</code> sequences of + values generated follow the + <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm" class="externalLink"> + Uniform Distribution</a>, which means that the values are evenly spread + over the interval between 0 and 1, with no sub-interval having a greater + probability of containing generated values than any other interval of the + same length. The mathematical concept of a + <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda36.htm" class="externalLink"> + probability distribution</a> basically amounts to asserting that different + ranges in the set of possible values of a random variable have + different probabilities of containing the value. Commons Math supports + generating random sequences from the following probability distributions. + The javadoc for the <code>nextXxx</code> methods in + <code>RandomDataImpl</code> describes the algorithms used to generate + random deviates from each of these distributions. + <ul><li><a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm" class="externalLink"> + uniform distribution</a></li> +<li><a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3667.htm" class="externalLink"> + exponential distribution</a></li> +<li><a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda366j.htm" class="externalLink"> + poisson distribution</a></li> +<li><a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3661.htm" class="externalLink"> + Gaussian distribution</a></li> +</ul> +</dd> +<dt>Cryptographically secure random sequences</dt> +<dd>It is possible for a sequence of numbers to appear random, but + nonetheless to be predictable based on the algorithm used to generate the + sequence. If in addition to randomness, strong unpredictability is + required, it is best to use a + <a href="http://www.wikipedia.org/wiki/Cryptographically_secure_pseudo-random_number_generator" class="externalLink"> + secure random number generator</a> to generate values (or strings). The + nextSecureXxx methods in the <code>RandomDataImpl</code> implementation of + the <code>RandomData</code> interface use the JDK <code>SecureRandom</code> + PRNG to generate cryptographically secure sequences. The + <code>setSecureAlgorithm</code> method allows you to change the underlying + PRNG. These methods are <strong>much slower</strong> than the corresponding + "non-secure" versions, so they should only be used when cryptographic + security is required.</dd> +<dt>Seeding pseudo-random number generators</dt> +<dd>By default, the implementation provided in <code>RandomDataImpl</code> + uses the JDK-provided PRNG. Like most other PRNGs, the JDK generator + generates sequences of random numbers based on an initial "seed value". + For the non-secure methods, starting with the same seed always produces the + same sequence of values. Secure sequences started with the same seeds will + diverge. When a new <code>RandomDataImpl</code> is created, the underlying + random number generators are <strong>not</strong> initialized. The first + call to a data generation method, or to a <code>reSeed()</code> method + initializes the appropriate generator. If you do not explicitly seed the + generator, it is by default seeded with the current time in milliseconds. + Therefore, to generate sequences of random data values, you should always + instantiate <strong>one</strong><code>RandomDataImpl</code> and use it + repeatedly instead of creating new instances for subsequent values in the + sequence. For example, the following will generate a random sequence of 50 + long integers between 1 and 1,000,000, using the current time in + milliseconds as the seed for the JDK PRNG: + <div class="source"><pre> +RandomData randomData = new RandomDataImpl(); +for (int i = 0; i < 1000; i++) { + value = randomData.nextLong(1, 1000000); +} + </pre> +</div> + + The following will not in general produce a good random sequence, since the + PRNG is reseeded each time through the loop with the current time in + milliseconds: + <div class="source"><pre> +for (int i = 0; i < 1000; i++) { + RandomDataImpl randomData = new RandomDataImpl(); + value = randomData.nextLong(1, 1000000); +} + </pre> +</div> + + The following will produce the same random sequence each time it is + executed: + <div class="source"><pre> +RandomData randomData = new RandomDataImpl(); +randomData.reSeed(1000); +for (int i = 0; i = 1000; i++) { + value = randomData.nextLong(1, 1000000); +} + </pre> +</div> + + The following will produce a different random sequence each time it is + executed. + <div class="source"><pre> +RandomData randomData = new RandomDataImpl(); +randomData.reSeedSecure(1000); +for (int i = 0; i < 1000; i++) { + value = randomData.nextSecureLong(1, 1000000); +} + </pre> +</div> +</dd> +</dl> +</p> +</div> +<div class="section"><h3><a name="a2.3_Random_Vectors"></a>2.3 Random Vectors</h3> +<p> + Some algorithm requires random vectors instead of random scalars. When the + components of these vectors are uncorrelated, they may be generated simply + one at a time and packed together in the vector. The <a href="../apidocs/org/apache/commons/math/random/UncorrelatedRandomVectorGenerator.html"> + org.apache.commons.math.UncorrelatedRandomVectorGenerator</a> class + does however simplify this process by setting the mean and deviation of each + component once and generating complete vectors. When the components are correlated + however, generating them is much more difficult. The <a href="../apidocs/org/apache/commons/math/random/CorrelatedRandomVectorGenerator.html"> + org.apache.commons.math.CorrelatedRandomVectorGenerator</a> class + provides this service. In this case, the user must set up a complete covariance matrix + instead of a simple standard deviations vector. This matrix gathers both the variance + and the correlation information of the probability law. + </p> +<p> + The main use for correlated random vector generation is for Monte-Carlo + simulation of physical problems with several variables, for example to + generate error vectors to be added to a nominal vector. A particularly + common case is when the generated vector should be drawn from a <a href="http://en.wikipedia.org/wiki/Multivariate_normal_distribution" class="externalLink"> + Multivariate Normal Distribution</a>. + </p> +</div> +<div class="section"><h3><a name="a2.4_Random_Strings"></a>2.4 Random Strings</h3> +<p> + The methods <code>nextHexString</code> and <code>nextSecureHexString</code> + can be used to generate random strings of hexadecimal characters. Both + of these methods produce sequences of strings with good dispersion + properties. The difference between the two methods is that the second is + cryptographically secure. Specifically, the implementation of + <code>nextHexString(n)</code> in <code>RandomDataImpl</code> uses the + following simple algorithm to generate a string of <code>n</code> hex digits: + <ol type="1"><li>n/2+1 binary bytes are generated using the underlying Random</li> +<li>Each binary byte is translated into 2 hex digits</li> +</ol> + + The <code>RandomDataImpl</code> implementation of the "secure" version, + <code>nextSecureHexString</code> generates hex characters in 40-byte + "chunks" using a 3-step process: + <ol type="1"><li>20 random bytes are generated using the underlying + <code>SecureRandom.</code></li> +<li>SHA-1 hash is applied to yield a 20-byte binary digest.</li> +<li>Each byte of the binary digest is converted to 2 hex digits</li> +</ol> + + Similarly to the secure random number generation methods, + <code>nextSecureHexString</code> is <strong>much slower</strong> than + the non-secure version. It should be used only for applications such as + generating unique session or transaction ids where predictability of + subsequent ids based on observation of previous values is a security + concern. If all that is needed is an even distribution of hex characters + in the generated strings, the non-secure method should be used. + </p> +</div> +<div class="section"><h3><a name="a2.5_Random_permutations_combinations_sampling"></a>2.5 Random permutations, combinations, sampling</h3> +<p> + To select a random sample of objects in a collection, you can use the + <code>nextSample</code> method in the <code>RandomData</code> interface. + Specifically, if <code>c</code> is a collection containing at least + <code>k</code> objects, and <code>randomData</code> is a + <code>RandomData</code> instance <code>randomData.nextSample(c, k)</code> + will return an <code>object[]</code> array of length <code>k</code> + consisting of elements randomly selected from the collection. If + <code>c</code> contains duplicate references, there may be duplicate + references in the returned array; otherwise returned elements will be + unique -- i.e., the sampling is without replacement among the object + references in the collection. </p> +<p> + If <code>randomData</code> is a <code>RandomData</code> instance, and + <code>n</code> and <code>k</code> are integers with + <code> k <= n</code>, then + <code>randomData.nextPermutation(n, k)</code> returns an <code>int[]</code> + array of length <code>k</code> whose whose entries are selected randomly, + without repetition, from the integers <code>0</code> through + <code>n-1</code> (inclusive), i.e., + <code>randomData.nextPermutation(n, k)</code> returns a random + permutation of <code>n</code> taken <code>k</code> at a time. + </p> +</div> +<div class="section"><h3><a name="a2.6_Generating_data_like_an_input_file"></a>2.6 Generating data 'like' an input file</h3> +<p> + Using the <code>ValueServer</code> class, you can generate data based on + the values in an input file in one of two ways: + <dl><dt>Replay Mode</dt> +<dd> The following code will read data from <code>url</code> + (a <code>java.net.URL</code> instance), cycling through the values in the + file in sequence, reopening and starting at the beginning again when all + values have been read. + <div class="source"><pre> + ValueServer vs = new ValueServer(); + vs.setValuesFileURL(url); + vs.setMode(ValueServer.REPLAY_MODE); + vs.resetReplayFile(); + double value = vs.getNext(); + // ...Generate and use more values... + vs.closeReplayFile(); + </pre> +</div> + + The values in the file are not stored in memory, so it does not matter + how large the file is, but you do need to explicitly close the file + as above. The expected file format is \n -delimited (i.e. one per line) + strings representing valid floating point numbers. + </dd> +<dt>Digest Mode</dt> +<dd>When used in Digest Mode, the ValueServer reads the entire input file + and estimates a probability density function based on data from the file. + The estimation method is essentially the + <a href="http://nedwww.ipac.caltech.edu/level5/March02/Silverman/Silver2_6.html" class="externalLink"> + Variable Kernel Method</a> with Gaussian smoothing. Once the density + has been estimated, <code>getNext()</code> returns random values whose + probability distribution matches the empirical distribution -- i.e., if + you generate a large number of such values, their distribution should + "look like" the distribution of the values in the input file. The values + are not stored in memory in this case either, so there is no limit to the + size of the input file. Here is an example: + <div class="source"><pre> + ValueServer vs = new ValueServer(); + vs.setValuesFileURL(url); + vs.setMode(ValueServer.DIGEST_MODE); + vs.computeDistribution(500); //Read file and estimate distribution using 500 bins + double value = vs.getNext(); + // ...Generate and use more values... + </pre> +</div> + + See the javadoc for <code>ValueServer</code> and + <code>EmpiricalDistribution</code> for more details. Note that + <code>computeDistribution()</code> opens and closes the input file + by itself. + </dd> +</dl> +</p> +</div> +<div class="section"><h3><a name="a2.7_PRNG_Pluggability"></a>2.7 PRNG Pluggability</h3> +<p> + To enable alternative PRNGs to be "plugged in" to the commons-math data + generation utilities and to provide a generic means to replace + <code>java.util.Random</code> in applications, a random generator + adaptor framework has been added to commons-math. The + <a href="../apidocs/org/apache/commons/math/random/RandomGenerator.html"> + org.apache.commons.math.RandomGenerator</a> interface abstracts the public + interface of <code>java.util.Random</code> and any implementation of this + interface can be used as the source of random data for the commons-math + data generation classes. An abstract base class, + <a href="../apidocs/org/apache/commons/math/random/AbstractRandomGenerator.html"> + org.apache.commons.math.AbstractRandomGenerator</a> is provided to make + implementation easier. This class provides default implementations of + "derived" data generation methods based on the primitive, + <code>nextDouble().</code> To support generic replacement of + <code>java.util.Random</code>, the + <a href="../apidocs/org/apache/commons/math/random/RandomAdaptor.html"> + org.apache.commons.math.RandomAdaptor</a> class is provided, which + extends <code>java.util.Random</code> and wraps and delegates calls to + a <code>RandomGenerator</code> instance. + </p> +<p> + Examples: + <dl><dt>Create a RandomGenerator based on RngPack's Mersenne Twister</dt> +<dd>To create a RandomGenerator using the RngPack Mersenne Twister PRNG + as the source of randomness, extend <code>AbstractRandomGenerator</code> + overriding the derived methods that the RngPack implementation provides: + <div class="source"><pre> +import edu.cornell.lassp.houle.RngPack.RanMT; +/** + * AbstractRandomGenerator based on RngPack RanMT generator. + */ +public class RngPackGenerator extends AbstractRandomGenerator { + + private RanMT random = new RanMT(); + + public void setSeed(long seed) { + random = new RanMT(seed); + } + + public double nextDouble() { + return random.raw(); + } + + public double nextGaussian() { + return random.gaussian(); + } + + public int nextInt(int n) { + return random.choose(n); + } + + public boolean nextBoolean() { + return random.coin(); + } +} + </pre> +</div> +</dd> +<dt>Use the Mersenne Twister RandomGenerator in place of + <code>java.util.Random</code> in <code>RandomData</code></dt> +<dd><div class="source"><pre> +RandomData randomData = new RandomDataImpl(new RngPackGenerator()); + </pre> +</div> +</dd> +<dt>Create an adaptor instance based on the Mersenne Twister generator + that can be used in place of a <code>Random</code></dt> +<dd><div class="source"><pre> + RandomGenerator generator = new RngPackGenerator(); + Random random = RandomAdaptor.createAdaptor(generator); + // random can now be used in place of a Random instance, data generation + // calls will be delegated to the wrapped Mersenne Twister + </pre> +</div> +</dd> +</dl> +</p> +</div> +</div> + + </div> + </div> + <div class="clear"> + <hr/> + </div> + <div id="footer"> + <div class="xright">© + 2003-2010 + + + + + + + + + + </div> + <div class="clear"> + <hr/> + </div> + </div> + </body> +</html>