Text-only Table of Contents (frame/ no frame)
(18) Example of Profiling Your Code Previous Top Next

Profiling Your Code

Steps for Profiling



[sas@discovery]$ icc -g -pg -openmp-stubs -o matmul_par matmul_par.c
matmul_par.c(60): warning #161: unrecognized #pragma
#pragma omp parallel for private(tmp, i, j, k)
^
[sas@discovery]$ ./matmul_par
Order 1000 multiplication in 13.498900 seconds
1 threads
Hey, it worked
all done

[sas@discovery] ls -ls gmon.out
4 -rw-rw-r-- 1 sas sas 1317 Feb 17 11:27 gmon.out

[sas@discovery] gprof -l matmul_par >gprof.out

[sas@discovery] more gprof.out

Flat profile:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
94.39 18.06 18.06 main (matmul_par.c:68 @ 400aea)
5.72 19.15 1.09 main (matmul_par.c:66 @ 400b22)
0.05 19.16 0.01 main (matmul_par.c:50 @ 4009f8)
0.05 19.17 0.01 main (matmul_par.c:54 @ 400a57)
0.05 19.18 0.01 main (matmul_par.c:70 @ 400b30)
0.05 19.19 0.01 main (matmul_par.c:85 @ 400c39)
.
.





Portion of matmul_par.c where the most time is spent

60 #pragma omp parallel for private(tmp, i, j, k)
61 for (i=0; i<Ndim; i++){
62 for (j=0; j<Mdim; j++){
63
64 tmp = 0.0;
65
66 for(k=0;k<Pdim;k++){
67 /* C(i,j) = sum(over k) A(i,k) * B(k,j) */
68 tmp += *(A+(i*Ndim+k)) * *(B+(k*Pdim+j));
69 }
70 *(C+(i*Ndim+j)) = tmp;
71 }
72 }
 
Results of Parallelizing  the Code With OpenMP

# of threads Time (secs.) Speedup
1 16.28 1.00
2 8.326 1.95
4 4.268 3.81
8 2.137 7.61






Previous Top Next


profile.src  last modified Mar 23, 2009 Introduction Table of Contents
(frame/no frame)
Printable
(single file)
© Dartmouth College