On 4 lane SIMD using SSE2 (my core 2 duo laptop) allOnPct serial while cwhile improvement ------------------------------------------------------------------- 43.93% 84.1ms 38.6ms(2.18x) 44.6ms(1.88x) .86 58.24% 78.5ms 31.7ms(2.48x) 36.4ms(2.16x) .87 69.46% 61.7ms 24.6ms(2.51x) 25.4ms(2.44x) .97 82.88% 43.1ms 18.3ms(2.35x) 17.7ms(2.43x) 1.03 86.21% 31.7ms 17.0ms(1.87x) 14.2ms(2.23x) 1.19 95.18% 28.2ms 14.2ms(1.98x) 11.2ms(2.51x) 1.27 100.00% 26.7ms 13.1ms(2.04x) 9.6ms(2.78x) 1.36 On a Core-i7 using AVX: allOnPct serial while cwhile improvement ------------------------------------------------------------------- 49.25% 206.2ms 33.6ms(6.13x) 34.6ms(5.95x) .97 61.13% 155.9ms 24.2ms(6.44x) 25.6ms(6.09x) .95 73.98% 99.1ms 18.0ms(5.51x) 19.4ms(5.10x) .93 81.24% 71.1ms 16.0ms(4.44x) 15.8ms(4.51x) 1.02 90.91% 61.1ms 13.9ms(4.38x) 13.1ms(4.66x) 1.06 100.0% 57.0ms 9.9ms(5.76x) 9.1ms(6.28x) 1.09