CS442 Assignment 1 Embarrassingly Parallel Game-of-Life

Due: Tuesday, September 9th

Goals:

1. Get up and running on the parallel machine(s). You'll have to figure out how to log in, edit, compile, run and debug parallel programs.

2. Use simple MPI calls; implement two collective communication routines yourself.

3. Investigate scaling performance of your code.

4. Develop code you'll use as starting point for later assignments.

Background:

Pretend, if you're not already, that you are a biologist interested in studying the ecology of the Gila Wilderness. Because you can't get cable TV within the Gila, instead of doing field work you decide to perform computational simulations. You know that water is a scarce resource, and that competition for the limited precipitation is fierce. In particular, if a plant is crowded by too many neighbors, it will die of thirst. However, due to the symbiotic relationship between the different types of vegetation, a plant with no neighbors will die as well. Plants will only survive if they manage to find the right ecological niche. Your plan is to investigate possible ecological scenarios and to determine which ones generate stable populations.

Assignment:

We'll give you a sequential code that runs the game-of-life. The game is played on a two-dimensional grid of size NX by NY. Each grid point has a value from 0 to 10, which represents the current density of vegetation at a given point. Initially, all values are randomly set to 0 or 1, with probability = PROB that a particular point has value 1. This process is based on a user-input random number SEED.

A timestep of the game consists of updating the value of every grid point according to the following rules:

1. SUM = sum of current values at the 8 neighboring points (left/right, up/down, 4 corner points). The grid is treated as a torus; that is, think of the bottom of the grid as touching the top, and the right as touching the left.

2. If SUM <= 3 or SUM >= 25, then decrement value by 1 (too lonely or crowded).

3. If SUM >= 4 and SUM <= 15, then increment value by 1 (growth condition).

4. If SUM > 15 and SUM < 25, no change in value (just right).

A single game runs for many timesteps until it terminates under one of 3 conditions:

1. All the vegetation dies.

2. It reaches steady-state, where the total vegetation (sum of all values on the grid) doesn't change for 10 timesteps. Often, instead of settling down to a single configuration the solution will cycle with periodicity 2 or 3. Such cycling is considered to be a steady-state solution and likewise terminates after 10 steps.

3. It exceeds 200 timesteps without reaching steady-state.

A complete simulation using the serial code runs N independent games, one after the other and prints out some statistics: the fraction of games that died out, the fraction that reached steady-state, and the fraction that exceeded the time limit. For the steady-state games, the code also computes the average time to stabilization and the average amount of vegetation alive on the grid.

Your assignment is to make this code parallel in the obvious way by having each of the P processors run N/P of the independent games. With the same inputs, your code should give identical answers when run on any number of processors. But life is simple in the Gila and in this first assignment -- you can assume P is a power-of-two (1,2,4,8,etc) and that N is a multiple of P.

Communicating the simulation parameters will require a broadcast operation and computing the statistics will require a reduce operation. You can use MPI calls to get this working quickly, but in the final version of your code, you must write these operations yourself, using MPI-Send and MPI-Receive.

What you hand in:

1. A program listing of your code. Since poorly formatted codes are difficult to grade, we expect you to include comments and to make your program easy to read. In what you hand in, you can omit the subroutines from the serial code that you don't change, but you should include the main program and any additional subroutines you write, particularly your broadcast and reduction routines.

2. The statistical results from 2 runs, each on various numbers of processors, including P = 1.

(1) NX = NY = 50, PROB = 0.35, N = 128, SEED = 314159

(2) NX = NY = 100, PROB = 0.28, N = 256, SEED = 271828

We are looking to see that you get the correct answers and that your answers do not depend on P.

3. A simple log-log plot of CPU time vs. P for one (or both) of these problems. Include a line that shows what ``perfect'' speed-up would be, along with data points for your actual timings. Tell us why you think the program scaled as it did.

4. Brief (!) answers to the Open Questions below.

Sequential Version: