Checkpoint Report 4/23:
ISPC Hardware Performance Monitoring Tool
What's done so far
First I got the ISPC compiled and built on my own machine. After realizing that
my poor laptop's Core2 did not have SSE4 or AVX, I asked for and was granted
access to a Core i7 machine with AVX instructions. Getting ISPC built there
was a little more tricky. I had to build LLVM and Clang from source and edit
the ISPC build system to use different libraries (as the default LLVM installed
was too old!). I cleaned this up my build system changes and got them
pulled into ISPC main branch (thanks Matt Pharr!).
Next, I dug into Agner Fog's code for using hardware performance counters.
His code is super complicated, but it all really boils down to a few
special instructions to reset and read the hardware counters. Most of
Agner Fog's code was for checking the architecture/reading the appropriate
counters. I think I can get the readings I want with only a few instructions.
Next, I took a bit of a detour to dig into ISPC. I figured that there's no
better way to understand the system than to fix a real issue, so I went and
found one. I added suggestions for mistyped labels. Now if you have a goto
with a label mistyped, the code uses string-distance metrics to look for
similar labels which you might have meant. It's not that useful... but it
was fun. That code has also been pulled into ISPC main. I have a better
understanding of the compiler context and function emitting code. I also
spent a lot of time just looking around and reading code in ISPC.
Most recently, I've been looking at the AOBench_instrumented example and
its underlying code in ISPC. When compiled with the --instrument flag,
ISPC expects you to export a function called ISPCInstrument which is
a callback emitted at various "interesting points" in the code. I
also ran the sqrt function from asst1 while instrumented. I noticed
a SEVERE slowdown from instrumentation. First, I'm going to accept
this slowdown and work with it, but later I want to create different
levels of instrumentation so the user can have better control. A
short inline function is fine to have often, but a longer function
like the one in the AOBench example kills any chance of hardware
performance counters having meaning.
What's up next
First, I want to try sticking an inline read in the instrumentation
code. If that significantly reduces the overhead, then I'll work
from there. If not, I'll look to see if I can reduce how often
ISPC instruments code. I'll be sure to consult Matt Pharr on the
issue too. I'm sure he's thought about it before.
What I'll demo
I would like to have a useful instrumentation tool. I would like
to be able to instrument a piece of code (probably a benchmark),
and see useful information. What that information is... depends
on how useful the instrumentation is. I suspect it will be
something like IPC, or lane fill percentage or branch divergence.
Although, perhaps it might be something memory related like
measuring scatter/gather vs sequential memory access.