Faster Number Theoretic Transform on Graphics Processors for Ring Learning with Errors Based Cryptography
Date:
In this talk, I presented a GPU-accelerated variant of the Number Theoretic Transform (NTT) known as the Generalized Discrete Galois Transform (DGT).
Abstract
The Number Theoretic Transform (NTT) has been revived recently by the advent of the Ring-Learning with Errors (Ring-LWE) Homomorphic Encryption (HE) schemes. In these schemes, the NTT is used to calculate the product of high degree polynomials with multi-precision coefficients in quasilinear time. This is known as the most time-consuming operation in Ring-based HE schemes. Therefore; accelerating NTT is key to realize efficient implementations. As such, in its current version, a fast NTT implementation is included in cuHE, which is a publicly available HE library in Compute Unified Device Architecture (CUDA). We analyzed cuHE NTT kernels and found out that they suffer from two performance pitfalls: shared memory conflicts and thread divergence. We show that by using a set of CUDA tailored-made optimizations, we can improve on the speed of cuHE NTT computation by 20%-50% for different problem sizes.