$29
- Name: Tanner Hobbs
- NetID: Hobbs131
Answer the questions below according to the lab specification. Write
your answers directly in this text file and submit it to complete the
lab.
PROBLEM 1: Timing Benchmarks
============================
The code in `reversal.c' implements two functions which reverse the
elements of an array. They take markedly different approaches.
,----
| void reverse_arr1(int *arr, long size){
| int *tmp = malloc(sizeof(int)*size);
| for(long i=0; i<size; i++){
| tmp[i] = arr[size-i-1];
| }
| for(long i=0; i<size; i++){
| arr[i] = tmp[i];
| }
| free(tmp);
| }
|
| void reverse_arr2(int *arr, long size){
| for(long i=0; i<size/2; i++){
| int tmp = arr[i];
| arr[i] = arr[size-i-1];
| arr[size-i-1] = tmp;
| }
| }
`----
(A) Predictions
~~~~~~~~~~~~~~~
Predict which of these two functions will run faster based on their
code characteristics.
Function reverse_arr2 will run faster because less instructions are required. Two for loops are ran through at n iterations in the first method whereas in the second only one for loop is ran for n/2 iterations
(B) Timing
~~~~~~~~~~
Modify the provided `reversal.c' file to perform timing calculations
on the two functions as they are called on different sizes of arrays.
Print runtimes in a tabular format.
Paste the C code you wrote below to show you how you timed the
function runs.
clock_t begin, end;
double cpu_time1;
double cpu_time2;
begin = clock();
reverse_arr1(arr1,size);
end = clock();
cpu_time1 = ((double) (end - begin)) / CLOCKS_PER_SEC;
printf("cpu time reverse_arr1: %.4e ", cpu_time1);
begin = clock();
reverse_arr2(arr2,size);
end = clock();
cpu_time2 = ((double) (end - begin)) / CLOCKS_PER_SEC;
printf("cpu time reverse_arr2: %.4e \n", cpu_time2);
(C) Analysis
~~~~~~~~~~~~
Paste the table of numbers your code produced for timing the two
reversal functions. Describe if the one you predicted would be faster
actually was measured to be faster.
cpu time reverse_arr1: 2.0000e-06 cpu time reverse_arr2: 1.0000e-06
cpu time reverse_arr1: 1.0000e-06 cpu time reverse_arr2: 1.0000e-06
cpu time reverse_arr1: 1.0000e-06 cpu time reverse_arr2: 1.0000e-06
cpu time reverse_arr1: 2.0000e-06 cpu time reverse_arr2: 1.0000e-06
cpu time reverse_arr1: 1.0000e-06 cpu time reverse_arr2: 1.0000e-06
cpu time reverse_arr1: 1.0000e-06 cpu time reverse_arr2: 1.0000e-06
cpu time reverse_arr1: 2.0000e-06 cpu time reverse_arr2: 1.0000e-06
cpu time reverse_arr1: 1.0000e-05 cpu time reverse_arr2: 2.0000e-06
cpu time reverse_arr1: 7.0000e-06 cpu time reverse_arr2: 2.0000e-06
cpu time reverse_arr1: 9.0000e-06 cpu time reverse_arr2: 4.0000e-06
The one I predicted to be faster seems to be faster
PROBLEM 2: ax then xpy VS axpy
==============================
Examine the timing code in `axpy.c'. Consider the 3 functions `ax()',
`xpy()', and `axpy()'. Note that these are timed in `main()' as
follows.
,----
| // start timing 1
| ax(a,x1,size);
| xpy(x1,y,size);
| // stop timing
|
| // start timing 2
| axpy(a,x2,y,size);
| // stop timing
`----
This is because the first two functions together do what `axpy()'
does.
Note also that `axpy()' has two "fairness" loops which do affect the
results but compensate for the fact that the previous two functions
combined have two loops.
Time these functions using the provided code. Explain the results you
observe and if one appears to be faster than the other, describe what
features of the processor and memory architecture might be affecting
the timings.
axpy seems to be slightly faster than ax and xpy together. This may be because the scaling of the array and the addition of arrays operations are done simultaneously in the for loops thus reducing the amount of instructions handled.
PROBLEM 3 (OPTIONAL): Profiler
==============================
This problem demonstrates the utility of a performance *profiler* to
help locate "hot spots" in code which take most of the run time
associated with it.
Change into the directory `warsim' that is part of the lab code
distribution. The code contained within it implements a simple
simulation of the card game War. The details of the game are not
important except that the program simulates playing of the game then
reports statistics on it.
First compile the program by using the provided Makefile.
,----
| > make
| gcc -Og -g -pg -no-pie -c libwar.c
| gcc -Og -g -pg -no-pie -c libcardlist.c
| gcc -Og -g -pg -no-pie -o warsim warsim.c libwar.o libcardlist.o
`----
Notice that the option `-pg' is passed in which will enable
/profiling/ of the code. The `-no-pie' option is present in case a
buggy version of `gcc' is present and has no significant effect.
Run the resulting `warsim' program as follows.
,----
| # show usage
| > ./warsim
| usage: ./warsim cardfile ngames [logfile]
|
| # simulate 10 games
| > ./warsim full.deck 10
| ----------------------
| Final stats
| 0.60 P1 Win Ratio
| 408.50 Avg battles per game
| 30.30 Avg wars per game
|
| # simulate 100 games
| > ./warsim full.deck 100
| ----------------------
| Final stats
| 0.54 P1 Win Ratio
| 301.56 Avg battles per game
| 22.94 Avg wars per game
|
| # simulate 2000 games
| > ./warsim full.deck 2000
| ----------------------
| Final stats
| 0.54 P1 Win Ratio
| 298.48 Avg battles per game
| 22.79 Avg wars per game
`----
This last run might take a few seconds as 2000 games are simulated.
After the runs are finished, the file `gmon.out' appears. This is an
output file that is generated on running programs with profiling
enabled.
,----
| > ls gmon.out
| gmon.out
| > file gmon.out
| gmon.out: GNU prof performance data - version 1
`----
The contents of the file are binary and must be interpreted by the
program `gprof'. Use the `-b' option to show "brief" output and pass
in the name of the program that was run.
,----
| > gprof -b warsim
| Flat profile:
|
| Each sample counts as 0.01 seconds.
| % cumulative self self total
| time seconds seconds calls ms/call ms/call name
| 50.11 0.06 0.06 2387860 0.00 0.00 print_list
| 25.06 0.09 0.03 32507650 0.00 0.00 card2str
| 8.35 0.10 0.01 596965 0.00 0.00 end_battle
| 8.35 0.11 0.01 4000 0.00 0.00 new_stack
| 8.35 0.12 0.01 52 0.19 0.19 str2card
| 0.00 0.12 0.00 8073378 0.00 0.00 stack_empty
| 0.00 0.12 0.00 6656812 0.00 0.00 queue_empty
| 0.00 0.12 0.00 3944508 0.00 0.00 stack_top
| 0.00 0.12 0.00 1675522 0.00 0.00 queue_add
| 0.00 0.12 0.00 1675522 0.00 0.00 queue_remove
| 0.00 0.12 0.00 1571470 0.00 0.00 queue_front
| 0.00 0.12 0.00 1465470 0.00 0.00 stack_pop
| 0.00 0.12 0.00 1465470 0.00 0.00 stack_push
| 0.00 0.12 0.00 596965 0.00 0.00 start_battle
| 0.00 0.12 0.00 6001 0.00 0.00 new_queue
| 0.00 0.12 0.00 6001 0.00 0.00 queue_free
| 0.00 0.12 0.00 4000 0.00 0.00 stack_free
| 0.00 0.12 0.00 2000 0.00 0.00 deal2
| 0.00 0.12 0.00 2000 0.00 0.06 playwar
| 0.00 0.12 0.00 2000 0.00 0.00 queue_copy
| 0.00 0.12 0.00 2000 0.00 0.00 queue_rotate
| 0.00 0.12 0.00 1 0.00 10.02 read_deck
| ...
`----
The first part of the output shows a breakdown of how much time was
spent in each function associated the most recent run of the program.
Analyze this breakdown and make suggestions as to where optimization
effort might best be spent to increase the speed of warsim.