Please don’t rely on this Gitea instance being around forever.
If any of your build scripts use my (kageru’s) projects hosted here, check my Github or IEW on Github for encoding projects. If you can’t find what you’re looking for there, tell me to migrate it.
That means you’ll finally get full 32bit precision, not just 8bit.
This comes at a cost (~20% lower FPS), but it’s a lot more accurate,
and there’s potentially room for optimization in the future because we
actually do math now instead of just hacky lookups in a 256-element
vector.
Directly calling f32::mul_add is actually more accurate and faster here
because rustc seems to be unable to rearrange the original instructions
in a way that can utilize `fma`.
When directly comparing the two implementations, mul_add was about 10%
faster on my machine (Ryzen 1700 with native target in rustflags).
Relevant Godbolt: https://godbolt.org/z/DDRZ4-