This
assignment helps you get familiar with MPI by implementing a parallel
odd-even sort algorithm. Experimental evaluations on your
implementation are required to guide you analyze the performance and
scalability of a parallel program. You are also encouraged to explore
any performance optimization and parallel computing skills in order
to receive a higher score.
In
this assignment, you are required to implement the odd-even sort
algorithm using MPI. Odd-even sort is a comparison sort which
consists of two main phases: even-phase
and
odd-phase.In
each phase, processes perform compare-and-swap
operations
repeatedly as follows until input array is sorted.
In
even-phase, all even/odd indexed pairs of adjacent elements are
compared. If the two elements of a pair are not sorted in the correct
order, the two elements are swapped. Similarly, the same
compare-and-swap operation is repeated for odd/even indexed pairs in
odd-phase. The odd-even sort algorithm works by alternating these two
phases until the input array is completely sorted.
For
you to understand this algorithm better, the execution flow of
odd-even sort is illustrated step by step as below: (The array is
sorted in ascending order)
Last
Update: 2019.9.26
- [Even-phase]
even/odd indexed adjacent elements are grouped into pairs.
- [Even-phase]
elements in a pair are swapped if they are in the wrong order.
- [Odd-phase]
odd/even indexed adjacent elements are grouped into pairs.
- [Odd-phase]
elements in a pair are swapped if they are in the wrong order.
-
Run
even-phase and odd-phase alternatively until
no
swap happens
in both even-phase and odd-phase.
The
above example is a simple case where the number of MPI
tasks(processes) and the size of array are the same. But your
implementation for the homework must be able to handle arbitrary
number of MPI tasks and array elements by letting each process
manages a sub-array partition instead of a single array elements.
More details on the implementation restrictions are given in Section
4.
Last
Update: 2019.9.26
-
Your
program is required to read an unsorted array from an input file,
and generate the sorted results to another output file.
- Your
program accepts 3 input parameters separated by space. They are:
i、
(Integer)
the size of the array
n(1
≤
n
≤
536870911) ii、
(String)
the input file name iii、(String)
the output file name
Your
program will be tested by our judge system with the command below:
-
srun
-nNPROC ./hw1
nin_file
out_file
Noted,
NPROC
is the number of processes, and hw1 is the name of your binary code.
-
The
input file contains
n
32-bit
floats in binary format. The first 4 bytes represents the first
floating point number, the fifth to eighth byte represents the
second one, and so on. (Sample input files will be given by our
judge system.)
-
The
output file must be generated by following the same format of the
input file. (Sample output files will also be given by our judge
system.)
Note:
The
float here refers to IEEE754
binary32,as
known as
single-precision
floating-point.
You
can use the
floattype
in C/C++.
Any
valid float values
are
possible
to show up in the input, except the
following
values:
Last
Update: 2019.9.26
-
Problem
assignments:
You are required to implement a parallel version of odd-even sort
under the given restrictions. Your goal is to optimize the
performance and reduce the total execution time.
- A
process can sort or perform any computations on its own local
elements.
-
For
the odd-even sorting phases, a process can only exchange its local
elements with its neighbor processes according to the communication
pattern described Section 2. For instance, MPI task with rank 5 can
only exchange
elements
with MPI task with ranks 4 and 6, but not the ones with rank 3, 7.
Noted,
the neighbor relationship is not circular.For
example, if your MPI program creates 10 processes, MPI task with
rank 0 cannot send any message to MPI task with rank 9 during the
sorting, and vise versa.
-
However,
any communication method, including collective communications
(i.e., broadcast, gather, scatter, etc.), are allowed for
initializationbefore
the sorting begins, ortermination
checking.
-
Requirements
& Reminders:
-
Must
use MPI-IO(MPI_File*functions)
to do file input and output.
-
Must
follow the input/output file format and execution command line
arguments specifications described in Section 3.
-
The
implemented algorithm
must
follow the odd-even sort principal, and comply the restrictions
described in Section 4.1.If
you are not
sure
whether your implementation follows the rules,
please
discuss with TA for approval.
Last
Update: 2019.9.26
- Title,
name, student ID
- Implementation
Briefly
describe your implementation in diagrams, figures, sentences,
especially in the following aspects:
- How
do you handle arbitrary number of input items and processes?
- How
do you sort in your program?
- Other
efforts you’ve made in your program.
- Experiment
& Analysis
Explain
how and why you do these experiments? Explain how you collect those
measurements? Show the results of your experiments in plots, and
explain your observations.
i、
Methodology
(a).
System
Spec(If
you run your experiments on your own cluster) Please specify the
system spec by providing the CPU, RAM, disk and network (Ethernet /
InfiniBand) information of the system.
(b).
Performance
Metrics:How
do you measure the computing time, communication time and IO time?
How do you compute the values in the plots?
ii、
Plots:
Speedup Factor & Time Profile
-
Conduct
strong scalability experiments, and plot the speedup and time
profile results as shown in figure 1 and figure 2.
(Please
note that these plots are just examples of how you can layout your
figures, and you are unlikely to get the same results)
-
Your
plots must contain at least 4 settings (e.g., scales) for both
single node and multi-node environment.
- Make
sure your plots are properly labeled and formatted.
-
You
are encouraged generate your own test case with proper problem size
to ensure the experimental results are accurate and meaningful.
(e.g. Make sure the execution time is long enough to have meaningful
difference for comparison)
Last
Update: 2019.9.26
Figure
1: Time profile Figure
2: Speedup
iii、Discussion
(Must base on the results in your plots)
-
Compare
I/O, CPU, Network performance. Which is/are the bottleneck(s)? Why?
How could it be improved?
-
Compare
scalability. Do your program scales well? Why or why not? How can
you achieve better scalability? You may discuss for the two
implementations separately or together.
iv、Others
You
are strongly encouraged to conduct more experiments and analysis of
your implementations.
- Experiences
/ Conclusion
It
can include these following aspects:
- Your
conclusion of this assignment.
- What
have you learned from this assignment?
- What
difficulties did you encounter in this assignment?
-
If
you have any feedback, please write it here. Such as comments for
improving the spec of this assignment, etc.
Last
Update: 2019.9.26
- [40%]
Correctness
-
You
can see 36 of them at
/home/pp19/share/hw1/cases,and
the other 6 are hidden test cases.
- You
lose 1 point for each failed case.
- [20%]
Performance
It
is graded by the execution time of your program on 6 hidden test
cases. Incorrect results will not get any point.
Visit
http://apollo.cs.nthu.edu.tw/scoreboard/hw1for
preliminaryresults.
- [20%]
Report
It
is graded based on the evaluations, discussions and writing in your
report. Higher score will be rewarded if more detailed performance
analysis of your implementation can be shown or explained by
experiments.
- [20%]
Demo
Demo
will mainly focus on the following aspect: ✔
Explain your implementation.
✔ Explain
the key results and findings from your report.
✔ Your
extra efforts. (Why you deserve more bonus points?)
- Policy
✔ 0point
will be given to cheater(even
copying code from the Internet).
✔ No
late submission after deadline will be accepted.
Last
Update: 2019.9.26
-
HOMEWORKSUBMISSION&
REMINDER
-
Upload
your report
in
pdf
format
named
report.pdf
on iLMS before the deadline
-
Upload
your source code and makefile to
~/homework/hw1
directory on apollo31
before the deadline.
The
files will be copied immediately after the deadline for official
judging.
i、
hw1.c
(or hw1.cc)
ii、
Makefile
-
A
self-checking
script
is provided to verify the correctness of your code. To test, simply
type the command
hw1-judgeunder
your source code directory.
-
Since
we have limited resources for everyone to share, please
start
your work ASAP to avoid long queuing delay.Do
not leave it until the last day!
-
A
scoreboard system will be available for you to checkout the current
ranking of your implementation.
Go
to
http://apollo.cs.nthu.edu.tw/scoreboardto
check scoreboard.
- You
are welcome to ask questions through iLMS!