Conway's Game of Life on GPU using CUDA
6 Conclusion
This project presented three different conventional evaluation algorithms for Conway's Game of Life: basic algorithm, bit counting algorithm, and lookup table algorithm. All three algorithms were carefully implemented on both CPU and GPU and thorough benchmark was performed.
The best algorithm for the CPU turned out to be the lookup table that yields speedups up to 45× over the serial CPU. The best algorithm for the GPG was the bit counting with speedup of 480× over the serial CPU. Following section shows detailed graphs and comparison of all algorithms.
Hardware
CPU: Xeon E5530 @ 2.40GHz with 4 cores (8 threads)
GPU: GeForce GTX 580 with 1.5 GB of graphics memory and 512 CUDA cores
6.1 Overall performance evaluation
Figure 1 presents absolute evaluation speeds in millions of evaluated cells per second. The GPU is able to evaluate amazing 28 trillion cells per second. This number is hard to imagine. I like to compare fast things with fast things, say with the speed of light – nearly 300,000 km/s, that's 1,000,000,000 km/h (671,000,000 mph).
The distance per evaluated cell dpc
can be computed as:
dpc = c/es = 299,792,458/24,688,686,207 = 0.01214 m
, where dpc
is distance per evaluated cell in meters,c
is speed of light in m/s, andes
is evaluation speed in cells/s.
One cell is evaluated for every 0.01214 meters passed by light! That's 12 mm or 0.48 in! That is quite mind blowing. Let me know if you have any conventional (no hash-life) algorithm that performs significantly better on comparable hardware.
Figure 1: Evaluation speed of all described algorithms with their best settings.
Figure 2: Speedup of all described implementations over basic serial CPU implementation.
Figure 3: Speedup of all described implementations over basic parallel CPU implementation.
6.2 Code and supplementary material
Please find full code published under Public Domain on the GitHub.