$19
Given a set of n points in a plane, you need to determine the distance between the closest pair of
points. Distance between points p =(x , y ) and p =(x , y ) is computed as i i =i √( −j )j2 +j ( − )2
You are provided with a program nbody.cu that has the following capability:
• The code generates coordinates for n points on the host;
• The coordinates are copied to the device memory;
• An incomplete kernel function is provided that is intended to compute pairwise distances between all pairs of points, determine the minimum distance, and save the value in a device variable;
• The device variable is copied to the host;
• The value is compared with the minimum distance calculated on the host;
• The code reports the time spent in the kernel function, data transfer between host and device, and host computation.
You need to modify the kernel function to compute the minimum distance. You are allowed to make other changes to the code, which facilitate parallelization on the GPU.
1. (70 points) You need to develop CUDA-based parallel code to compute the distance between the closest pair of points on a GPU. 50 points will be awarded if the code compiles and executes the following commands successfully.
./nbody.exe 16
./nbody.exe 1024
./nbody.exe 2048
10 points are reserved for a brief write-up describing the changes you made to the code.
Additional 10 points are reserved for performance of the code: speed improvement obtained by
the code over the host code.
2.
(20 points) Execute the code for = 2 for k = 4,…,10. Plot GPU and CPU execution time versus
k on the same plot to demonstrate how execution time varies with the problem size on these
platforms. Use logarithmic scale for the x-axis. Next, plot the GPU and CPU execution time for
= 2 for k = 11,…,16. For what value of n does the GPU code become faster that the CPU code?
3.
(10 points) Plot the data transfer time from host to device and from device to host on the same
plot for = 2 for k = 4,…,16.
Submission: You need to upload the following to Canvas:
1. Problem 1: Submit the file nbody.cu.
2. Problem 1, 2 & 3: Submit a single PDF or MSWord document with your response.
Helpful Information:
1. You may use Grace for this assignment.
2. Information on compiling and running CUDA programs on a GPU for Grace is available at https://hprc.tamu.edu/wiki/Terra:Compile:All#CUDA_Programming.
3. To develop code interactively, log on to one of the GPU-equipped login nodes on grace; these nodes are named graceX.hprc..tamu.edu, where X is to be replaced by the numbers 1, 2, or 3.
Page 1 of 2
4. Load the modules for compiler and CUDA prior to compiling your program. Use: module load intelcuda/2020a
5. Compile C programs using nvcc. For example, to compile code.cu to create the executable code.exe, use
nvcc –ccbin=icc –o code.exe code.cu
6. The run time of a code should be measured when it is executed in dedicated mode. Sample batch files for Grace are available at https://hprc.tamu.edu/wiki/Grace:Batch#Job_File_Examples. To execute the code on a GPU-equipped node, you must use the submission options for GPUs.