This presentation is targeted at attendees who both do their own code development and need their calculations to finish as quickly as possible. We'll cover the effective use of cache, loop-level optimizations, force reductions, optimizing compilers and their limitations, short-circuiting, time-space trade-offs and more. Exercises will be done mostly in C, but emphasis will be on general techniques that can be applied in any language. We will also cover AMD specific compiler options, libraries and performance tools.