kageru
4831298a9b
Directly calling f32::mul_add is actually more accurate and faster here because rustc seems to be unable to rearrange the original instructions in a way that can utilize `fma`. When directly comparing the two implementations, mul_add was about 10% faster on my machine (Ryzen 1700 with native target in rustflags). Relevant Godbolt: https://godbolt.org/z/DDRZ4- |
||
---|---|---|
.. | ||
lib.rs | ||
mask.rs |