Given
a set of n
points in a plane, you need to determine the distance between the
closest pair of
points.
Distance between points p
=(x , y )
and p
=(x , y )
is computed as i i
=i
√( −j
)j2
+j
( − )2
You
are provided with a program
nbody.cu
that has the following capability:
- The
code generates coordinates for n points on the host;
- The
coordinates are copied to the device memory;
-
An
incomplete kernel function is provided that is intended to compute
pairwise distances between all pairs of points, determine the
minimum distance, and save the value in a device variable;
- The
device variable is copied to the host;
- The
value is compared with the minimum distance calculated on the host;
-
The
code reports the time spent in the kernel function, data transfer
between host and device, and host computation.
You
need to modify the kernel function to compute the minimum distance.
You are allowed to make other changes to the code, which facilitate
parallelization on the GPU.
- (70
points) You need to develop CUDA-based parallel code to compute the
distance between the closest pair of points on a GPU. 50 points will
be awarded if the code compiles and executes the following commands
successfully.
./nbody.exe
16
./nbody.exe
1024
./nbody.exe
2048
|
10
points are reserved for a brief write-up describing the changes
you made to the code. |
|
Additional
10 points are reserved for performance of the code: speed
improvement obtained by |
|
the
code over the host code. |
2. |
(20
points) Execute the code for
= 2
for k = 4,…,10. Plot GPU and CPU execution time versus |
|
k
on the same plot to demonstrate how execution time varies with the
problem size on these |
|
platforms.
Use logarithmic scale for the x-axis. Next, plot the GPU and CPU
execution time for |
|
=
2
for k = 11,…,16. For what value of n does the GPU code become
faster that the CPU code? |
3. |
(10
points) Plot the data transfer time from host to device and from
device to host on the same |
|
plot
for
= 2
for k = 4,…,16. |
Submission:
You need to upload the following to Canvas:
- Problem
1: Submit the file
nbody.cu.
- Problem
1, 2 & 3: Submit a single PDF or MSWord document with your
response.
Helpful
Information:
- You
may use Grace for this assignment.
-
Information
on compiling and running CUDA programs on a GPU for Grace is
available at
https://hprc.tamu.edu/wiki/Terra:Compile:All#CUDA_Programming.
-
To
develop code interactively, log on to one of the GPU-equipped login
nodes on grace; these nodes are named graceX.hprc..tamu.edu, where X
is to be replaced by the numbers 1, 2, or 3.
-
Load
the modules for compiler and CUDA prior to compiling your program.
Use: module
load intelcuda/2020a
-
Compile
C programs using
nvcc.
For example, to compile
code.cu
to create the executable code.exe,
use
nvcc
–ccbin=icc –o code.exe code.cu
-
The
run time of a code should be measured when it is executed in
dedicated mode. Sample batch files for Grace are available at
https://hprc.tamu.edu/wiki/Grace:Batch#Job_File_Examples.
To
execute the code on a GPU-equipped node, you must use the submission
options for GPUs.