Vectorization Rules


Numerical computing comes in different flavors, but usually it reduces to applying some function to a matrix of numbers. In the exploratory phase of a project, it's common to process your data with ordinary loops, because that kind of thinking is more familiar:

for r in range(num_rows)
    for c in range(num_cols):
        matrix[r, c] *= 2

But with the right matrix library, this is an order of magnitude faster:

matrix[r, c] *= 2

Why are libraries like numpy so fast? There are a few reasons at play:

  • Native code — High-level languages like Python are expressive but slow. By backing off to low-level languages with precise control over the machine, we get the benefit of high-level glue with the speed of low-level code.

  • Dense types — When we know we have an array of int64, we can more easily store that arry in memory and apply transformations to it.

  • Vectorization — When we perform high-level operations at the vector level as opposed to the element level, the machine can run more efficiently and even process data in parallel.

Judicious use of vectorizing can 100x runtime if applied to the right places. But where to optimize? Don't just guess: measure. If you install a library like flameprof, you can easily a get a flamegraph of your code:

python -m cProfile -o outprof && flameprof > flame.svg

And the hot spots will be obvious.