Support for SIMD operations for RGG1555
Description
Up to 3x the performance for RGG1555 with SIMD operations. The current implementation is hardcoded for 16 bytes at a time (128 bits). Still requires some testing and validation for AVX2 and AVX512.
Fixes #45 (closed)