This tutorial demonstrates how to use AMD CodeAnalyst to analyze the performance of an application program. The tutorial provides step-by-step directions for using AMD CodeAnalyst. We recommend reading the tutorial sections in the order listed below.
This tutorial uses the example program that is distributed with AMD CodeAnalyst. Source code for the example program is installed with CodeAnalyst. To find the source code, locate the directory into which CodeAnalyst was installed and then find the samples/classic directory. Follow the steps in the section Preparing An Application For Profiling to compile the example program and make it ready for profiling.
The example program, classic, implements the straightforward "textbook" algorithm for matrix multiplication. Matrix multiplication is performed in the function multiply_matrices(). This function provides an opportunity for optimization. The classic implementation takes long, non-unit strides through one of the operand matrices. These long strides cause frequent data translation lookaside buffer (DTLB) misses that penalize execution.
For quick reference to options available in the CodeAnalyst workspace, see:
Exploring the Workspace