adaptivegrain/src
kageru 4831298a9b
Optimize LUT generation
Directly calling f32::mul_add is actually more accurate and faster here
because rustc seems to be unable to rearrange the original instructions
in a way that can utilize `fma`.
When directly comparing the two implementations, mul_add was about 10%
faster on my machine (Ryzen 1700 with native target in rustflags).
Relevant Godbolt: https://godbolt.org/z/DDRZ4-
2020-05-13 23:26:23 +02:00
..
lib.rs Optimize LUT generation 2020-05-13 23:26:23 +02:00
mask.rs Optimize LUT generation 2020-05-13 23:26:23 +02:00