I would like a function that can generate a pseudo-random sequence of values, but for that sequence to be repeatable every run. The data I want has to be reasonably well randomly distributed over a given range, it doesn’t have to be perfect.
I want to write some code which will have performance tests run on it, based on random data. I would like that data to be the same for every test run, on every machine, but I don’t want to have to ship the random data with the tests for storage reasons (it might end up being many megabytes).
The library for the
random module doesn’t appear to say that the same seed will always give the same sequence on any machine.
EDIT: If you’re going to suggest I seed the data (as I said above), please provide the documentation that says the approach valid, and will work on a range of machines/implementations.
EDIT: CPython 2.7.1 and PyPy 1.7 on Mac OS X and CPython 2.7.1 and CPython 2.52=.2 Ubuntu appear to give the same results. Still, no docs that stipulate this in black and white.
The documentation does not explicitly say that providing a seed will always guarantee the same results, but that is guaranteed with Python’s implementation of random based on the algorithm that is used.
According to the documentation, Python uses the Mersenne Twister as the core generator. Once this algorithm is seeded it does not get any external output which would change subsequent calls, so give it the same seed and you will get the same results.
Of course you can also observe this by setting a seed and generating large lists of random numbers and verifying that they are the same, but I understand not wanting to trust that alone.
I have not checked that other Python implementations besides CPython but I highly doubt they would implement the random module using an entirely different algorithm.
For this purpose, I’ve used a repeating MD5 hash, since the intention of a hashing function is a cross-platform one-to-one transformation, so it will always be the same on different platforms.
import md5 def repeatable_random(seed): hash = seed while True: hash = md5.md5(hash).digest() for c in hash: yield ord(c) def test(): for i, v in zip(range(100), repeatable_random("SEED_GOES_HERE")): print v
184 207 76 134 103 171 90 41 12 142 167 107 84 89 149 131 142 43 241 211 224 157 47 59 34 233 41 219 73 37 251 194 15 253 75 145 96 80 39 179 249 202 159 83 209 225 250 7 69 218 6 118 30 4 223 205 91 10 122 203 150 202 99 38 192 105 76 100 117 19 25 131 17 60 251 77 246 242 80 163 13 138 36 213 200 135 216 173 92 32 9 122 53 250 80 128 6 139 49 94
Essentially, the code will take your seed (any valid string) and repeatedly hash it, thus generating integers from 0 to 255.
There are platform differences, so if you move your code between different platforms I would go for the method that DrRobotNinja described.
Please take a look at the following example. Python on my desktop machine (64-bit Ubuntu with a Core i7, Python 2.7.3) gives me the following:
> import random > r = random.Random() > r.seed("test") > r.randint(1,100) 18
But if I run the same code on my Raspberry Pi (Raspbian on ARM11), I get a a different result (for the same version of Python)
> import random > r = random.Random() > r.seed("test") > r.randint(1,100) 34
If the quality of the random numbers isn’t as critical as the repeatability-across-platforms, you can use one of the traditional linear congruential generators:
class lcg(object): def __init__( self, seed=1 ): self.state = seed def random(self): self.state = (self.state * 1103515245 + 12345) & 0x7FFFFFFF return self.state
Since this is coded in your program using integer arithmetic, it should be deterministically repeatable across any reasonable platform.
Specify a seed to the random number generator. If you provide the same seed, your random numbers should also be the same.
Also an answer why the example from this answer does produce different output on different machines:
It is because when seeding the random generator the seed has to be a integer number. If you seed the generator with some non-integer it has to be hashed first. The hash functions themselfes are not platform independent (obviously at least not all of them, correct me if you know more).
So to pull it all together: Python uses a pseudo-random number generator. Therefore, when started from the same state, the produced sequence of random numbers will always be the same, independent of platform. It just a deteministic algorithm without further input from the outside world.
This means: as long as you initialize your random generator with the same state, it will produce the same sequence of numbers. Getting to the same state can be done using the same integer seed or by saving and reapplying the old state (random.getstate() and random.setstate()).
Using random.seed(…) You can generate a repeatable sequence. A demonstration:
import random random.seed(321) list1 = [random.randint(1,10) for x in range(5)] random.seed(321) list2 = [random.randint(1,10) for x in range(5)] assert(list1==list2)
This works because random.seed(…) is not truly random: it’s pseudo-random, whereby successive numbers are produced by permuting some state machine, given an initial starting condition, the ‘seed’.
I just tried the following:
import random random.seed(1) random.random() random.random() random.random() random.seed(1) random.random() random.random() random.random()
I entered each line at the CLI at various speeds over multiple times. Produced the same values each time.