CORRELATION ANALYSIS OF SHUFFLED DATA CREATED USING EXCEL’S RAND() FUNCTION

B. Hughes, 2017 August 18


The overall results show that Excel’s RAND() is an adequate starting point to create random sets of data for solitaire games at least, and the implication is that if any better random number generator is used as a starting point it is more likely to produce statistically separate “shuffles” or statistically separate cards within a shuffle. Other more specific conclusions are as follows:


a) All randomizing processes produce random variables that have distributions that are in close agreement with statistical expectation for uniformity, and they show no “fails” to 90% confidence when compared to a theoretical uniform distribution.

b) All randomizing processes and covariance protocols display the expected 𝜆/𝑁1/2 behaviour as the number 𝑁 of samples in the covariance estimate or the average of the covariance estimates increases, and the constant 𝜆 for each protocol is in approximate agreement with the expected value.

c) The covariances display the approximately 𝛿-function behaviour expected for independent samples for the autocovariances within a “shuffle”. The above chart shows the autocorrelation of 52 "cards" averaged over 1000 "shuffles" using the Modulus technique. The black horizontal line at -0.0196.. shows the mathematically expected value for independency and it can be seen that the data values (red) indicate an average that is in close agreement.

d) The KS tests of significance show that the overall number of “fails” at 90% confidence is different for RANDBETWEEN than for the other methods, particularly as is seen in Table 4.





THE COMPLETE ARTICLE
with description of the results, statistical analysis,
and references.

1. Introduction

In the course of determining the statistics associated with the results of Klondike Solitaire1, the question of the suitability of online random number generators arose regarding their use in online Solitaire games, particularly for producing the shuffling process.

One online random number generator with common and easy access is Excel’s RAND() function and its subsidiary RANDBETWEEN. Even though it is highly unlikely that RAND() would be used for Klondike Solitaire, its ready access prompted a desire to examine the correlational properties that existed between the “cards” in a given shuffle when the shuffle was produced using a sort process that started with Excel’s RAND(). As well, the correlational properties between different shuffles (but for the same “card” position), and the correlations between the full card distributions from shuffle-to-shuffle, have been examined to determine whether the “shuffles” and the “card” locations within a “shuffle” were indeed statistically independent in a measurable way. The results of this examination not only provide conclusions that pertain to RAND() quite separately from Klondike Solitaire, but it is likely that they are indicative of possibilities for other random number generators as well.

All computations were carried out using Excel and three different randomization processes: (i) a direct use of RANDBETWEEN for the numbers 1 to 52, simulating a deck of cards in a straightforward way; (ii) a use of the MOD function (i.e. modulus) on RAND() multiplied by 109 to simulate a closer result to the Growly Solitaire software application2 (although that software uses ISAAC rather than RAND() for the basic random number generation process); and (iii) a double modulus process that used the randomization of the MOD process multiplied by 109 and subjected once more to Excel’s MOD function, as in (ii). In using Excel, the “calculation” preference was set to Manual with only 1 iteration for the part of the process that creates the random sets of data. A semi-automatic arrangement was used to produce the number of sets needed for the sample averages, and it was tested to ensure that Excel wasn’t proceeding too quickly for the calculations to be performed appropriately……(
more).





© 2017. Blyth Hughes. All Rights Reserved.