Ir al contenido

Foto

Random() vs. RandomFloat(), or Zen and the Art of Randomness


  • Por favor identifícate para responder
1 respuesta en este tema

#1
DarthGizka

DarthGizka
  • Members
  • 867 mensajes

DA script offers two different basic functions for obtaining random numbers - Random() and RandomFloat() - and these are powered by different generators internally.

RandomFloat() wraps the simple generator that ships with Visual C++, which is 32-bit linear congruential generator (LCG). The lower 16 bits of the output are discarded to avoid the worst of the problems associated with LCGs*. The high bit also falls by the wayside, which leaves 15 bits to work with. The resulting 15-bit random values are then normalised to the range [0, 1) via division by 32767.5; multiplying the output by that number lets you recover the original 15 bits that were extracted from the VC++ generator.

The game scripts don't use RandomFloat() very often outside of combat, and so the generator will often be at its initial value after a savegame is loaded. This means if you dump a couple recovered values then you can recognise them as the high words of the well-known starting sequence of the VC++ generator.

The small number of different outputs (32768) leads to noticable binning effects unless the multipliers used are very small. The factor used for normalising is slightly off as well (32767.5 instead of 32768), which leads to further distortion for values at the extreme ends of the range. For normal usage - e.g. if (RandomFloat() * 100.0f < fSomeProbability) - there are no problems at all, since the difference in probability of possible values is on the order of ±1/32678. The function would also be okay for computing a d20, since such a small difference in probability is so far below the noise level as makes no odds.

However, Random() is much more convenient than RandomFloat() for computing dice, and it is also a lot better under the hood. It is powered by four parallel 32-bit XorShift generators whose output is combined to deliver 32 random bits per call. The period of the generator is roughly 2^113, vs. 2^32 for the simple VC++ generator behind RandomFloat()**. If quality were a concern then the generator output would have to be tempered but for scripting purposes it should be more than good enough.

Random() then applies the passed modulus to the generator output. It does so in a way that makes the results non-uniform (due to binning) unless the passed modulus is a power of two, but for scripting purposes this should be mostly irrelevant except in certain pathological cases.

High-quality random floats can be obtained by recovering the low 31 bits of the generator output via -Random(0x80000000) and then normalising to [0, 1) by multiplying with 2^-31. However, with floats having only 24 bits of precision, Random(0x01000000) * 2^-24 should work just as well for scripting purposes, or indeed RandomFloat().

Note: the effect of binning on Random() is markedly different from that on RandomFloat(). The occurrence counts for the output values for FloatToInt(RandomFloat() * 2^16 / 3) over the generator range make a strictly alternating sequence of 2 1 2 1 2..., and the deviation for (RandomFloat() * 21845.0f > 10922.5f) is the usual microscopic 1/32768. I.e. RandomFloat() stretches the quantisation error over its whole range, leading to an effect that is similar to aliasing in pictures.

By contrast, for Random(2^33 / 3) all values in the lower half of the result range are twice as likely as those in the upper half. In other words, the probability of (Random(0xAAAAAAAAu) < 0x55555555u) is 66.6% instead of 50%, which is probably not quite what people might expect given the high precision and quality of the raw generator. I.e. Random() unloads the quantisation error in a compact chunk at the beginning of the range.

In practical terms this could well make the enemy density in one half of a battlefield twice as high as that in the other half, even though you expected it to be uniform and RandomFloat() would have resulted in a perfectly uniform distribution. Moral: know your tools and which to use when. Size - generator bitness and period length - may matter much less than other factors.

*) For LCGs of this type, the k low bits cycle with a period of 2^k, meaning the low bit strictly alternates, the two low bits repeat with a period of four, and so on.

**) If your script consumed one million random numbers per second then it would take a bit more than an hour for RandomFloat() to start repeating itself; for Random() it would be 329,067,905,820,483,965,457,824 years.



#2
DarthGizka

DarthGizka
  • Members
  • 867 mensajes

Since seeing is believing, I coded a little test script that counts RandomFloat() < 0.5f for 100 calls to the real RandomFloat() and also for an emulation of RandomFloat() via calls to Random() with arbitrary - but cunningly chosen - moduli. The script takes the number of calls as an optional parameter but with something like 1000 the engine craps out after just one round.

Here's the result for three invocations without parameters and one with parameter 1000.

Random_Float.jpg
 

#include "core_h"
               
const int NUMBER_OF_CALLS_ = 100;
const int WITH_PERCENTAGE_ = FALSE;
             
void report_result_ (int n, int lo, int modulus);
float RandomFloat_ (int modulus);


void main ()
{
   int[] moduli;
   
   moduli[0] = 0xAAAAAAAA;  // == 2/3 * 2^32
   moduli[1] = 0x80000000;
   moduli[2] = 0;           // calls the real RandomFloat()

   int  m = GetArraySize(moduli), test, n = StringToInt(GetLocalString(GetModule(), "RUNSCRIPT_VAR"));
   
   if (n == 0)  n = NUMBER_OF_CALLS_;

   for (test = 0; test < m; ++test)
   {
      int i, lo = 0, modulus = moduli[test];

      for (i = 0; i < n; ++i)
      {
         if (RandomFloat_(modulus) < 0.5f)
         {
            ++lo;
         }
      }
      
      report_result_(n, lo, modulus);
   }
}

//--------------------------------------------------------------------------------------------------
// convert an integer to a float representing the unsigned integer value

float U32ToFloat_ (int n)
{
   return IntToFloat(n) + (n < 0 ? 4294967296.0f : 0.0f);   // integers >= 2^31 show up negative
}

//--------------------------------------------------------------------------------------------------
// emulate RandomFloat() by calling Random() with a given modulus; call original for modulus == 0

float RandomFloat_ (int modulus)
{
   if (modulus == 0)
   {
      return RandomFloat();
   }

   int r = Random(modulus);   // Random() negates the remainder for moduli >= 2^31

   return U32ToFloat_(modulus < 0 ?  -r : r) / U32ToFloat_(modulus);
}

//--------------------------------------------------------------------------------------------------

void report_result_ (int n, int lo, int modulus)
{
   int hi = n - lo;

   string what = modulus == 0 ? "RandomFloat()" : "Random(" + IntToHexString(modulus) + ")";
   string info = IntToString(lo) + " : " + IntToString(hi);                                 
   
   if (WITH_PERCENTAGE_)
   {
      info += " = " + IntToString(lo * 100 / n) + "% : " + IntToString(hi * 100 / n) + "%";
   }

   int colour = (Max(lo, hi) > Min(lo, hi) * 4/3) ? 0xFF0000 : (modulus ? 0xFFFF00 : 0xFFFFFF);

   DisplayFloatyMessage(GetMainControlled(), what, FLOATY_MESSAGE, colour, 7.0f);
   DisplayFloatyMessage(GetMainControlled(), info, FLOATY_MESSAGE, colour, 7.0f);
}

For the curious, here is a simple and efficient way of achieving a perfectly uniform distribution for a function like DA Script's Random(), as I coded it as an extension to the RNG classes in "Numerical Recipes in C". It's basically the same approach as that used in java.util.Random.nextInt(int), though maybe somewhat less cryptic. The explanation is a lot longer than the actual code, so I'm skipping it here. 

NumRpRNG::U32 NumRpRNG::mod32 (U32 modulus)
{
   for ( ; ; )
   {
      U32 raw_bits = int32();
      U32 result = raw_bits % modulus;
      U32 check = U32(raw_bits - result + modulus);

      if (check >= raw_bits || check == 0)
      {
         return result;
      }
   }
}