Processing math: 100%

FFTs over time and space... and devices

The following article is a work in progress, and will be continually updated over time. I'll issue a RELEASE tag or similar when everything has been finished.


Q. How do available FFT routines compare over different libraries and accelerator hardwares?


This is something I have wondered for sometime, though it can be difficult for an apples-to-apples comparison. I will consider the above question in relation to the following plan:

Comparing FFT implementations in a pseudospectral solver for linear and nonlinear Schrodinger equation dynamics

To decide on how to proceed, it is instructive to list all (or most) widely known and used implementations.

Standard implementations

[CPU]

[GPU]

[Cutting edge]

Test cases

To ensure sufficient time is spent in the FFT routines I will opt for a 2D transform of an NxN grid, requiring N transforms, a transpose, and another N. The value of N can be scaled to determine sweet spots for all implementations. Granted, memory size will be the determining factor of how large we can go. N=2j where jZ,7j11 is a reasonable range of values to examine that should fit readily on most hardware.

For this, the following test precisions will be instructive:

  • C2C - 32-bit float (32-bits for each real and imaginary component)

  • C2C - 64-bit float

  • R2C - 32-bit float

  • R2C - 64-bit float

As a sample system to investigate, I will consider the case of a quantum harmonic oscillator (QHO). To investigate dynamics and their resulting accuracy, a known and analytically calculable result is useful. For this, I will opt for a a superposition state of the groundstate along the xy plane, and the first excited state along x with the groundstate along y. To put things more formally:

Ψ(x,y)=Ψ00(x,y)+Ψ10(x,y)2. To construct the above wavefunction we will assume that the states Ψ00(x,y) and Ψ10(x,y) are outer products of the solutions to the QHO, as given by:

Ψn(x)=12nn!(mωxπ)14emωxx22Hn(x), where Hn(x)=(1)nex2dndxn(ex2), are the physicists’ Hermite polynomials. As we are looking for the ground and first excited states (n=0,1) we can simplify a lot of the above calculations: H0(x)=1 and H0(x)=2x. Next, we determine the states Ψ0(x) and Ψ1(x) as:

Ψ0(x)=(mωxπ)14emωxx22, Ψ1(x)=2x2(mωxπ)14emωxx22.

Along a single dimension, our system can potentially be in any of the allowed harmonic oscillator states. Therefore, the overall state is given by a tensor product of the state along x and that along y. If we assume the dimensionality of the Hilbert space along the i-th orthogonal spacial index is finite and given by di, then the overall system dimensionality (ie the total number of states we can consider) is d=idi.

Assuming an n-dimensional system, where each individual dimension is in the groundstate of the harmonic oscillator, we can define the tensor product state as Ψn=Ψn0=Ψ0Ψ0Ψ0.

While the above is fine for a general case, for the purposes of FFTs we have assumed a much simpler system, dealing with combinations of only 2 states. We can then define them as Ψ00(x,y)=Ψ0(x)Ψ0(y) and Ψ10(x,y)=Ψ1(x)Ψ0(y). Thus, our final superposition state when filling everything in is given by Ψ(x,y)=(mπ)12(ωxωy)14em2(ωxx2+ωyy2)(x+12).

To simplify life, I will assume some of the above quantities are set to unity (i.e. m=ωx=ωy==1). This will not change the dynamics of the simulation, but simplify the equations and the numerical implementation, which can help with improving accuracy. The above equation then becomes Ψ(x,y)=(1π)12e12(x2+y2)(x+12).

For a more detailed explanation and approach to the above have a look at Christina Lee’s (albi3ro) blog post on time evolution, which will cover the method I use next.

To be continued

/***

This document will implement the quantum simulation aspects of the code, in as simple a means as possible.

***/
#include<stdio.h>
#include<stdlib.h>
#include<math.h>

#define GRIDSIZE 128
#define GRIDMAX 5
#define PI 3.14159
#define DT 1e-4


//################################################//

struct double2{
    double x;
    double y;
};

//Define the grid on which the simulation will be run.
struct Grid{
    double *x = (double*) malloc(GRIDSIZE*sizeof(double));
    double *y = (double*) malloc(GRIDSIZE*sizeof(double));
    double *xy2 = (double*) malloc(GRIDSIZE*GRIDSIZE*sizeof(double));

    double *kx = (double*) malloc(GRIDSIZE*sizeof(double));
    double *ky = (double*) malloc(GRIDSIZE*sizeof(double));
    double *kxy2 = (double*) malloc(GRIDSIZE*GRIDSIZE*sizeof(double));

    double2 *wfc = (double2*) malloc(GRIDSIZE*GRIDSIZE*sizeof(double2));
    double2 *U_V = (double2*) malloc(GRIDSIZE*GRIDSIZE*sizeof(double2));
    double2 *U_K = (double2*) malloc(GRIDSIZE*GRIDSIZE*sizeof(double2));
};

void setupGrids(Grid *grid){
    double invSqrt2 = 1/sqrt(2);
    //Position space grids
    for (int j=0; j < GRIDSIZE; ++j){
        grid->x[j] = -GRIDMAX + j*(2*GRIDMAX)/((double)GRIDSIZE);
        grid->y[j] = grid->x[j];
    }
    for(int i=0; i < GRIDSIZE; ++i){
        for(int j=0; j < GRIDSIZE; ++j){
            grid->xy2[j+GRIDSIZE*i] = 0.5*(grid->x[i]*grid->x[i]+grid->y[j]*grid->y[j]);
            //Evolution operator in position space
            grid->U_V[j+GRIDSIZE*i].x = cos(-grid->xy2[j+GRIDSIZE*i]*DT);
            grid->U_V[j+GRIDSIZE*i].y = sin(-grid->xy2[j+GRIDSIZE*i]*DT);
            
            //Wavefunction
            grid->wfc[j+GRIDSIZE*i].x = sqrt( 1/PI )*exp( -0.5 * ( grid->x[i]*grid->x[i] + grid->y[j]*grid->y[j] ) ) * ( grid->x[i] + invSqrt2 );
            grid->wfc[j+GRIDSIZE*i].y = 0.;
        }
    }

    //Momentum space grids
    for (int j=0; j < GRIDSIZE; ++j){
        grid->kx[j] = (j<(GRIDSIZE/2)) ? j*2*PI/(GRIDMAX) : -(GRIDSIZE - j)*(2*PI/(GRIDMAX));
        grid->ky[j] = grid->kx[j];
    }
    for(int i=0; i < GRIDSIZE; ++i){
        for(int j=0; j < GRIDSIZE; ++j){
            grid->kxy2[j+GRIDSIZE*i] = 0.5*(grid->kx[i]*grid->kx[i]+grid->ky[j]*grid->ky[j]);

            //Evolution operator in momentum space
            grid->U_K[j+GRIDSIZE*i].x = cos(-grid->kxy2[j+GRIDSIZE*i]*DT);
            grid->U_K[j+GRIDSIZE*i].y = sin(-grid->kxy2[j+GRIDSIZE*i]*DT);
        }
    }
}

//################################################//
int fileIO(Grid *grid){
    //Open files to write
    FILE *f_x = fopen("x", "w");
    FILE *f_y = fopen("y", "w");
    FILE *f_xy2 = fopen("xy2", "w");
    FILE *f_kx = fopen("kx", "w");
    FILE *f_ky = fopen("ky", "w");
    FILE *f_kxy2 = fopen("kxy2", "w");
    FILE *f_U_V = fopen("U_V", "w");
    FILE *f_U_K = fopen("U_K", "w");
    FILE *f_wfc = fopen("wfc", "w");

    //No safety checks because I'm a bad person

    fwrite(grid->x, sizeof(double), GRIDSIZE, f_x);
    fwrite(grid->y, sizeof(double), GRIDSIZE, f_y);
    fwrite(grid->xy2, sizeof(double), GRIDSIZE*GRIDSIZE, f_xy2);
    fwrite(grid->kx, sizeof(double), GRIDSIZE, f_kx);
    fwrite(grid->ky, sizeof(double), GRIDSIZE, f_ky);
    fwrite(grid->kxy2, sizeof(double), GRIDSIZE*GRIDSIZE, f_kxy2);
    fwrite(grid->U_V, sizeof(double2), GRIDSIZE*GRIDSIZE, f_U_V);
    fwrite(grid->U_K, sizeof(double2), GRIDSIZE*GRIDSIZE, f_U_K);
    fwrite(grid->wfc, sizeof(double2), GRIDSIZE*GRIDSIZE, f_wfc);

    //No moar opin

    fclose(f_x); fclose(f_y); fclose(f_xy2); fclose(f_kx);  fclose(f_ky);
    fclose(f_kxy2); fclose(f_U_V); fclose(f_U_K); fclose(f_wfc);
    return 0;
}

//################################################//
int main(){
    Grid grid;
    setupGrids(&grid);
    fileIO(&grid);
}


//################################################//

alt text alt text alt text