http://benji3up2kxewkqfcq7buxk2xd6zwy3zggnurkrm3l4cvwy2iipvyyad.onion/mirrors/gmpdoc/Introduction-to-GMP.html
The speed of GMP is achieved by using fullwords as the basic arithmetic type,
by using sophisticated algorithms, by including carefully optimized assembly
code for the most common inner loops for many different CPUs, and by a general
emphasis on speed (as opposed to simplicity or elegance). There is assembly code for these CPUs: ARM Cortex-A9, Cortex-A15, and generic ARM,
DEC Alpha 21064, 21164, and 21264,
AMD K8 and K10 (sold under many brands, e.g. Athlon64, Phenom,...