Due: Thursday, November 13
Goals:
Background:
Impressed by your bugling elk simulations, the Lincoln County Ranchers
Association (LCRA) has funneled serious grant money into your
department. Your advisor sees the light and decides to change the
direction of your research. With a government subsidy, the LCRA has
set up infrared monitoring stations around the perimeter of the Gila
which detect whenever an elk leaves the safe haven of the Gila and
laser tattoos it with a unique ID number. Periodically these
monitoring computers must communicate and sort their data so that ID
numbers can be handed out to LCRA members for their weekly
``park-and-poach'' activity. The LCRA wants you to write a portable
MPI code for this purpose which they will own and can license to other
conservation-minded groups around the country. While skeptical that
this really qualifies as cutting-edge environmental research, as a
maturing graduate student, you recognize the need to satisfy paying
customers.
Assignment:
You will write a parallel program to sort a set of numerical values. To make life easier you may assume that no two values are identical. Initially, each processor owns an equal fraction of the values. When your program is finished, each processor will own a sorted subset of values: that is, processor 0 will own the lowest few values, processor 1 the next few, and so on. Each processor will also sort its owned subset so that in aggregate, the entire list is in sorted order. To perform the sort, you should implement the parallel divide-and-conquer variant of quicksort presented in class: divide the data into two subsets of low and high values on two subgroups of processors by finding a median, then recurse within each subgroup. To simplify your task, you can assume that the number of processors is a power-of-two.
Input to your program will consist of the number of values to sort (N), a random number seed (SEED) used to generate the list of values, and an index (M) of the value to output. Your code should run correctly for any N on any power-of-two processors P, including the case where P > N. The output of your code is the Mth value in the sorted list and a simple ``check-sum'' calculation on the sorted values which is more easily described in the code itself. For both of these outputs you will find the MPI_Scan function useful. Your parallel code should also include the following features: computation of the CPU time for the sort (not including initialization of the random values), creation and freeing of MPI communicators, use of asynchronous receives and waits for exchanging large sets of values between processors, and safeguards to insure that the exchanged data does not overflow allocated memory.
You should start with a sequential Fortran
or C program which performs this task.
What you hand in:
P=1 | N = 100 | N = 1000 | N = 10000 | N = 100000 |
CPU (secs) | 3.44E-4 | 5.20E-3 | 8.25E-2 | 1.28 |
Mth | 0.19539653 | 0.19475528 | 0.19816056 | 0.20036508 |
Checksum | 32.757375 | 329.27865 | 3314.1820 | 33329.805 |
Speed-Up | 1.0 | 1.0 | 1.0 | 1.0 |
Efficiency | 1.0 | 1.0 | 1.0 | 1.0 |
Be sure to include at least 8 digits of precision in the Mth value and checksum entries so that we can verify that your code is correct. Comment on the performance trends you see in the table for one-processor, fixed-size, and scaled-size problems. In other words, tell us why you think the numbers show what they do. Note: there are several scaling issues here, so you need to give more thought to this than in past assignments.
Open Questions: