davedx 1 day ago

> This is indeed twice as fast as the vectorized implementation, but, disappointingly, the naive implementation with loops is even faster.

On CPU or GPU?

1
kccqzy 18 hours ago

This is NumPy we are discussing. It doesn't use the GPU.