http://3cbzkrvakrpetjjppdwzbzqrlkmzatjs7jbyazap5gwutj32gcltjpqd.onion
But on the flipside, I think this extra precalc
makes the algorithm much less amenable to a theoretical GPU implementation
(~8 MB private data per instance, as opposed to one large shared
static pool of constants and then just 1 kB of state per instance),
which would otherwise be nontrivial but probably possible (the problem
itself is so parallel).