Levenberg-Marquardt sparse solver scaling: Initial tests

STRUMPACK vs EIGEN performance

The goal of this notebook is to allow the documentation of STRUMPACK vs EIGEN performance to be maintained in a single accessible location. The environment within which this notebook is run follows the standard cctbx conda build instructions available here. For this instance, we are using the STRUMPACK-enabled build of cctbx located at ExaFEL:cctbx_project(str_merge). STRUMPACK is currently built using the installation script STRUMPACK_installer_shared.sh, and if the installation takes place within the same directory as moddules and build, the cctbx build process can make use of it as a backend. After the STRUMPACK solver finishes, enter the build directory, run libtbx.refresh and make. The STRUMPACK-supported modules should now build and link with the new backend.

The solution of the below systems assume that we have some A and b data already, and we can provide these in CSV file format to the solver (A is in “row col value” format). We can begin by importing all of the required modules. We make use of Numpy’s ability to parse CSV files into numeric format, and SciPy’s sparse storage format to format the matrices into the required CSR sparse storage format for solving.

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import matplotlib
import matplotlib.pyplot as plt
font = {'family' : 'serif',
        #'weight' : 'bold',
        'size'   : 12}

matplotlib.rc('font', **font)
matplotlib.rcParams['figure.figsize'] = (16,10)
matplotlib.rcParams['figure.dpi']= 150
import matplotlib.gridspec as gridspec
from __future__ import division
from cctbx.array_family import flex
from libtbx.test_utils import approx_equal
from libtbx.development.timers import Profiler
import sys
import numpy as np
import scipy.sparse as sps

%env BOOST_ADAPTBX_FPE_DEFAULT=1
%env BOOST_ADAPTBX_SIGNALS_DEFAULT=1
env: BOOST_ADAPTBX_FPE_DEFAULT=1
env: BOOST_ADAPTBX_SIGNALS_DEFAULT=1

We load the A and b data set for solving from the locations specified below. Ideally, we can loop over and load all different data sets and process each individually. For a working example, we begin by processing a single data set, comparing Eigen and Strumpack for varying levels of shared parallelism.

A_path="/net/dials/raid1/mlxd/InDev/feb_sprint/sam/samosa/step4K_samosa_debug_1k/out_strumpack_1k_omp1_paramslevmar.parameter_flags=Bfactor-/A_strum_1k_omp1_paramslevmar.parameter_flags=Bfactor-.csv"
A_mat = np.loadtxt(A_path,dtype={'names':('rows','cols','vals'),'formats':('i8','i8','f8')})

b_path="/net/dials/raid1/mlxd/InDev/feb_sprint/sam/samosa/step4K_samosa_debug_1k/out_strumpack_1k_omp1_paramslevmar.parameter_flags=Bfactor-/b_strum_1k_omp1_paramslevmar.parameter_flags=Bfactor-.csv"
b_vec = np.loadtxt(b_path)

We are solving system of normal equations, and so we can safely assume that we have a square matrix. This ensures the number of rows and colums will be identical, which can be read from the length of the numpy array b_vec.

n_rows = len(b_vec)
n_cols = n_rows

We next wish to create a SciPy CSR sparse matrix type using the data from A_mat.

A_sp = sps.csr_matrix((A_mat['vals'],(A_mat['rows'],A_mat['cols'])))

With this, we can now create the appropriate flex data types for solving the linear system. While the operations performed upto and including this point involve many copy steps, it can be stated that the time to solve the system is much greater than the time to perform internal copies, and so we can ignore the times required for these steps for any realistic datasets.

For the solvers we require the row index pointers, the column indices, and sparse matrix values. These can be read directly and converted to flex types as follows.

A_indptr = flex.int(A_sp.indptr)
A_indices = flex.int(A_sp.indices)
A_values = flex.double(A_sp.data)
b = flex.double(b_vec)

With the data in the required format, we can now create a solver object and solve the system. We also include a built-in profiler object to time this solver step for both the STRUMPACK and EIGEN backends.

To keep a record of the timings, we can create a dictionary to store each respective value.

import time
timing_dict = {"strum":0, "eigen":0}

Next, we load the STRUMPACK-enabled solver module, and define the function to call the solver. This solver code is contained within “modules/cctbx_project/scitbx/examples/bevington/strumpack_solver_ext.cpp” and creates a Boost.Python wrapper for both an EIGEN and STRUMPACK Ax=b solver object.

A non-notebook script exists to test this functionality at “modules/cctbx_project/scitbx/examples/bevington/strumpack_eigen_solver.py”, which can be called as:

libtbx.python strumpack_eigen_solver.py A_mat.csv b_vec.csv

where A_mat.csv and b_vec.csv are the CSV files with the linear system to be solved.

In this notebook, we can instead load and process the files explicitly.

import scitbx_examples_strumpack_solver_ext as solver_ext
es = solver_ext.eigen_solver
ss = solver_ext.strumpack_solver

def run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b):
    P = Profiler("STRUMPACK")
    #timing_dict["strum"]=time.time()
    res_strum = ss(n_rows, n_cols, A_indptr, A_indices, A_values, b)
    #timing_dict["strum"]=time.time()-timing_dict["strum"]
    del P

    P = Profiler("EIGEN")
    #timing_dict["eigen"]=time.time()
    res_eig = es(n_rows, n_cols, A_indptr, A_indices, A_values, b)
    #timing_dict["eigen"]=time.time() - timing_dict["eigen"]
    del P
    for i in xrange(len(res_strum.x)):
        assert( approx_equal(res_strum.x[i], res_eig.x[i]) )
        
run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)
individual call time for STRUMPACK: CPU,  291.570s; elapsed,    5.425s
individual call time for EIGEN: CPU,    5.950s; elapsed,    0.266s

While we can use the above method to process the data, we must respecify the OMP_NUM_THREADS variable to observe different levels of scalability. %env OMP_NUM_THREADS is sufficient to test this for the first run, however for an undetermined reason the notebook must be started to allow the value to be changed. There, we can run the above tasks from a !<command> cell, specifying the thread count here instead.

OMP_SOLVER='''
from __future__ import division
from cctbx.array_family import flex
from libtbx.test_utils import approx_equal
from libtbx.development.timers import Profiler
import sys
import numpy as np
import scipy.sparse as sps
import matplotlib
import matplotlib.pyplot as plt
import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams['figure.figsize'] = (16,10)

A_path=sys.argv[1] 
A_mat = np.loadtxt(A_path,dtype={'names':('rows','cols','vals'),'formats':('i8','i8','f8')})

b_path=sys.argv[2] 
b_vec = np.loadtxt(b_path)

n_rows = len(b_vec)
n_cols = n_rows

A_sp = sps.csr_matrix((A_mat['vals'],(A_mat['rows'],A_mat['cols'])))
A_indptr = flex.int(A_sp.indptr)
A_indices = flex.int(A_sp.indices)
A_values = flex.double(A_sp.data)
b = flex.double(b_vec)

#import time
#timing_dict = {"strum":0, "eigen":0}

import scitbx_examples_strumpack_solver_ext as solver_ext
es = solver_ext.eigen_solver
ss = solver_ext.strumpack_solver

def run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b):
    P = Profiler("STRUMPACK")
    #timing_dict["strum"]=time.time()
    res_strum = ss(n_rows, n_cols, A_indptr, A_indices, A_values, b)
    #timing_dict["strum"]=time.time()-timing_dict["strum"]
    del P

    P = Profiler("EIGEN")
    #timing_dict["eigen"]=time.time()
    res_eig = es(n_rows, n_cols, A_indptr, A_indices, A_values, b)
    #timing_dict["eigen"]=time.time() - timing_dict["eigen"]
    del P
    
    for i in xrange(len(res_strum.x)):
        assert( approx_equal(res_strum.x[i], res_eig.x[i]) )
    
run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)
'''
OMP_SOLVER = OMP_SOLVER
OMP_SOLVER_FILE = open("OMP_SOLVER.py", "w")
OMP_SOLVER_FILE.write(OMP_SOLVER)
OMP_SOLVER_FILE.close()

DATAPATH="/net/dials/raid1/mlxd/InDev/feb_sprint/sam/samosa/"
A_LIST = !find {DATAPATH} -iname "A*.csv"
B_LIST = [ii.replace('/A_','/b_') for ii in A_LIST]

We now record the indices of the data with a specific number of images. We can use these indices to later submit jobs with a given number of frames, and hence resulting matrix size.

list_idx={}
for imgs in ['1k','5k','10k','32k']:
    list_idx.update({imgs:[i for i, j in enumerate(A_LIST) if imgs in j]})
list_idx
{'10k': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
 '1k': [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 '32k': [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
 '5k': [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]}

With the file writtent to disk, we now create a loop in mixed Python/Bash to call the solver, then extract the standard output from the profiler to examine the scalability. The profiler code wraps each solver only, and does not take into account the load-times for the data.

str_out={}
import os
threads_list = [1,2,4,8,16,32,64]
for imgs_size in list_idx:
    print "Data Set Size:=%s"%imgs_size
    #Subselect the smallest data size for now
    if imgs_size != "1k":
        print "Skipping %s"%imgs_size
        continue
        
    for imgs_idx in list_idx[imgs_size]:
        dat_name = A_mat.split('/')[-1][2:-4]
        print "Data Set Name:=%s"%dat_name
        for threads in threads_list:
            print "OMP_NUM_THREADS:=%d"%threads

            A_mat = A_LIST[imgs_idx]; b_vec = B_LIST[imgs_idx]

            #Ensure the A and b data are matched correctly
            assert(os.path.dirname(A_mat) == os.path.dirname(b_vec))

            val = !OMP_NUM_THREADS={threads} libtbx.python ./OMP_SOLVER.py {A_mat} {b_vec}
            key = 'omp' + str(threads) + '_' + dat_name
            str_out.update({key:val})

—CLICK FOR OUTPUT—

Data Set Size:=10k
Skipping 10k
Data Set Size:=1k
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Bfactor-levmar.parameter_flags=Deff-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Rxy-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Bfactor-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Deff-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Bfactor-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Deff-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Bfactor-levmar.parameter_flags=Deff-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Bfactor-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Deff-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Bfactor-levmar.parameter_flags=Deff-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Bfactor-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Deff-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_1k_omp1_paramslevmar.parameter_flags=Bfactor-levmar.parameter_flags=Deff-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Size:=5k
Skipping 5k
Data Set Size:=32k
Skipping 32k

In the event of a crash, it is wise to save the resulting data (JSON and Pickle) to disk…

import json, cPickle
with open("OMP_SOLVER_TIME_1k.json", "w") as OMP_SOLVER_TIME_FILE:
    OMP_SOLVER_TIME_FILE.write(json.dumps(str_out)) 
with open("OMP_SOLVER_TIME_1k.pickle", "w") as OMP_SOLVER_TIME_FILE:
    OMP_SOLVER_TIME_FILE.write(cPickle.dumps(str_out))

…which can then be easily reloaded later. Now, we parse the string, extract the timing data, and add it to the python object at the specified thread number index. This allows for easily sorting and plot the data later.

import cPickle
str_out=cPickle.load(open('OMP_SOLVER_TIME_1k.pickle', 'rb'))

To simplify indexing later for plots, we create shortened keys. Additionally, we ca also separate the data based upon unique index set and thread numbers.

str_out_sm={}
for k in str_out.keys():
    str_out_sm.update({k.replace('levmar.parameter_flags=','').replace('_strum_1k_omp1_params','_'):str_out[k]})
u_dat={}
threads_list=[1,2,4,8,16,32,64]
uniq_ref = set([k.split('_')[1] for k in str_out_sm.keys()])
df_list=[]
for u in uniq_ref:
    same_t = lambda: None #Functions are objects, so legit
    same_t.strum = []
    same_t.eig = []
    for t in threads_list:
        same_t.strum.append(str_out_sm["omp%d_%s"%(t,u)][0].rsplit(',')[-1].strip()[0:-1])
        same_t.eig.append(str_out_sm["omp%d_%s"%(t,u)][1].rsplit(',')[-1].strip()[0:-1])
    u_dat.update({u:same_t})
    str_E = "EIG_%s"%u
    str_S = "STRUM_%s"%u
    df_list.append( pd.DataFrame({str_E:same_t.eig,str_S:same_t.strum}, index=threads_list).transpose() )

We now have a list of Pandas Dataframes, with which we can combine to a single entity. We can then proceed to plot the resulting columns, comparing both Eigen and STRUMPACK for each specific set of refined parameters.

import pandas as pd
perf_s = pd.concat(df_list).transpose()
perf_s

—CLICK FOR OUTPUT—

EIG_Rxy-Deff-Eta- STRUM_Rxy-Deff-Eta- EIG_Bfactor- STRUM_Bfactor- EIG_Bfactor-Deff-Eta- STRUM_Bfactor-Deff-Eta- EIG_Rxy-Bfactor-Deff-Eta- STRUM_Rxy-Bfactor-Deff-Eta- EIG_Deff-Eta- STRUM_Deff-Eta- ... EIG_Eta- STRUM_Eta- EIG_Deff- STRUM_Deff- EIG_Rxy-Bfactor- STRUM_Rxy-Bfactor- EIG_Rxy-Bfactor-Eta- STRUM_Rxy-Bfactor-Eta- EIG_Rxy-Bfactor-Deff- STRUM_Rxy-Bfactor-Deff-
1 0.013 15.598 0.010 12.359 0.020 20.328 0.011 20.885 0.018 21.803 ... 0.017 22.696 0.051 10.401 0.017 20.001 0.016 18.391 0.018 18.415
2 0.013 8.859 0.011 7.880 0.021 11.665 0.013 11.145 0.019 11.933 ... 0.016 10.479 0.053 5.888 0.018 12.719 0.019 10.128 0.019 9.713
4 0.012 5.314 0.009 4.655 0.020 8.141 0.013 7.335 0.019 7.673 ... 0.018 6.631 0.057 3.791 0.017 7.902 0.019 6.261 0.019 5.929
8 0.012 3.838 0.015 3.328 0.020 4.839 0.016 3.587 0.028 5.963 ... 0.016 5.231 0.056 2.671 0.015 4.368 0.015 4.427 0.016 4.156
16 0.018 3.186 0.015 2.485 0.022 4.720 0.020 3.667 0.032 4.466 ... 0.016 6.855 0.059 2.156 0.018 4.799 0.021 3.579 0.017 3.526
32 0.020 3.269 0.011 2.776 0.041 4.463 0.025 3.016 0.021 3.709 ... 0.026 3.214 0.087 2.134 0.034 4.253 0.021 4.905 0.019 3.712
64 0.016 6.079 0.026 4.142 0.027 4.427 0.015 5.049 0.027 5.785 ... 0.023 4.746 0.086 3.149 0.023 6.350 0.021 6.270 0.032 5.650

7 rows × 30 columns

With the performance data in an easily accessible format, we can plot each respective data set for both solvers. Each title corresponds to the parameters being refined, the x-axis is the number of OpenMP threads used in the solution, and the y-axis is the time to solve the system in seconds.

# define the figure size and grid layout properties
figsize = (10, 8)
cols = 4
gs = gridspec.GridSpec(cols, cols)
gs.update(hspace=0.6)
fig1 = plt.figure(num=1, figsize=figsize)

fig1.suptitle('1k data', size=16)

ax = []
for i, u in enumerate(uniq_ref):
    row = (i // cols)
    col = i % cols
    ax.append(fig1.add_subplot(gs[row, col]))
    if(col==0):
        ax[-1].set_ylabel('time [s]')
    ax[-1].set_title(str(u)[:-1])
    ax[-1].plot(perf_s.index, perf_s['EIG_'+u], 'o', ls='-', ms=6, label='EIG')
    ax[-1].plot(perf_s.index, perf_s['STRUM_'+u], 'x', ls='-', ms=6, label='STRUM')
    ax[-1].xaxis.set_ticks(threads_list)
    ax[-1].set_xscale('log', basex=2)
    ax[-1].set_xlim([threads_list[0]/1.8,threads_list[-1]*1.8])
    
    ymin, ymax = ax[-1].get_ylim()
    ax[-1].set_ylim([-1,ymax+1])

    for label in ax[-1].yaxis.get_ticklabels()[::2]:
        label.set_visible(False)
    for label in ax[-1].xaxis.get_ticklabels()[1::2]:
        label.set_visible(False)

ax[-1].legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,)
<matplotlib.legend.Legend at 0x7fc4eada0d50>
png

As expected, the OMP_NUM_THREADS variable allowed the STRUMPACK code to improve in performance, but for this data size EIGEN clearly offers better performance. Next we increase the number of frames in the dataset by a factor of 5, and perform the same analysis.

import os
threads_list = [1,2,4,8,16,32,64]
for imgs_size in list_idx:
    print "Data Set Size:=%s"%imgs_size
    #Subselect the smallest data size for now
    if imgs_size != "5k":
        print "Skipping %s"%imgs_size
        continue
        
    for imgs_idx in list_idx[imgs_size]:
        
        A_mat = A_LIST[imgs_idx]; b_vec = B_LIST[imgs_idx]
        dat_name = A_mat.split('/')[-1][2:-4]
        print "Data Set Name:=%s"%dat_name
        #Ensure the A and b data are matched correctly
        assert(os.path.dirname(A_mat) == os.path.dirname(b_vec))
            
        for threads in threads_list:
            print "OMP_NUM_THREADS:=%d"%threads

            val = !OMP_NUM_THREADS={threads} libtbx.python ./OMP_SOLVER.py {A_mat} {b_vec}
            key = 'omp' + str(threads) + '_' + dat_name
            str_out.update({key:val})
import json, cPickle
with open("OMP_SOLVER_TIME_5k.json", "w") as OMP_SOLVER_TIME_FILE:
    OMP_SOLVER_TIME_FILE.write(json.dumps(str_out)) 
with open("OMP_SOLVER_TIME_5k.pickle", "w") as OMP_SOLVER_TIME_FILE:
    OMP_SOLVER_TIME_FILE.write(cPickle.dumps(str_out))

—CLICK FOR OUTPUT—

Data Set Size:=10k
Skipping 10k
Data Set Size:=1k
Skipping 1k
Data Set Size:=5k
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Rxy-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Bfactor-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Deff-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Bfactor-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Deff-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Bfactor-levmar.parameter_flags=Deff-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Bfactor-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Deff-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Bfactor-levmar.parameter_flags=Deff-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Bfactor-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Deff-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Bfactor-levmar.parameter_flags=Deff-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Name:=strum_5k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Bfactor-levmar.parameter_flags=Deff-levmar.parameter_flags=Eta-
OMP_NUM_THREADS:=1
OMP_NUM_THREADS:=2
OMP_NUM_THREADS:=4
OMP_NUM_THREADS:=8
OMP_NUM_THREADS:=16
OMP_NUM_THREADS:=32
OMP_NUM_THREADS:=64
Data Set Size:=32k
Skipping 32k

import pandas as pd
import cPickle
str_out=cPickle.load(open('OMP_SOLVER_TIME_5k.pickle', 'rb'))
threads_list = [1,2,4,8,16,32,64]

str_out_sm={}
for k in str_out.keys():
    if "strum_5k" in k:
        str_out_sm.update({k.replace('levmar.parameter_flags=','').replace('_strum_5k_omp1_params','_'):str_out[k]})
    
u_dat={}
uniq_ref = set([k.split('_')[1] for k in str_out_sm.keys()])
df_list=[]
for u in uniq_ref:
    same_t = lambda: None #Functions are objects, so legit
    same_t.strum = []
    same_t.eig = []
    for t in threads_list:
        same_t.strum.append(str_out_sm["omp%d_%s"%(t,u)][0].rsplit(',')[-1].strip()[0:-1])
        same_t.eig.append(str_out_sm["omp%d_%s"%(t,u)][1].rsplit(',')[-1].strip()[0:-1])
    u_dat.update({u:same_t})
    str_E = "EIG_%s"%u
    str_S = "STRUM_%s"%u
    df_list.append( pd.DataFrame({str_E:same_t.eig,str_S:same_t.strum}, index=threads_list).transpose() )
    
perf_s = pd.concat(df_list).transpose()
perf_s

# define the figure size and grid layout properties
figsize = (10, 8)
cols = 4
gs = gridspec.GridSpec(cols, cols)
gs.update(hspace=0.6)
fig2 = plt.figure(num=2, figsize=figsize)

fig2.suptitle('5k data', size=16)

ax = []
for i, u in enumerate(uniq_ref):
    row = (i // cols)
    col = i % cols
    ax.append(fig2.add_subplot(gs[row, col]))
    if(col==0):
        ax[-1].set_ylabel('time [s]')
    ax[-1].set_title(str(u)[:-1])
    ax[-1].plot(perf_s.index, perf_s['EIG_'+u], 'o', ls='-', ms=6, label='EIG')
    ax[-1].plot(perf_s.index, perf_s['STRUM_'+u], 'x', ls='-', ms=6, label='STRUM')
    ax[-1].xaxis.set_ticks(threads_list)
    ax[-1].set_xscale('log', basex=2)
    ax[-1].set_xlim([threads_list[0]/1.8,threads_list[-1]*1.8])
    
    ymin, ymax = ax[-1].get_ylim()
    ax[-1].set_ylim([-1,ymax+1])

    for label in ax[-1].yaxis.get_ticklabels()[::2]:
        label.set_visible(False)
    for label in ax[-1].xaxis.get_ticklabels()[1::2]:
        label.set_visible(False)
ax[-1].legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,)
<matplotlib.legend.Legend at 0x7fc4eae1e290>
png

It may be convenient to submit the run job to a background IPyParallel cluster environment. We can use the default environment, and enable a single python instance to exploit the OMP_NUM_THREADS parallelism of the node.

import ipyparallel as ipp
c = ipp.Client()
rc=c[:]
c.ids #Instance IDs
[0]

By wrapping the solving code from eariler into a function, we can submit this to the background process, and supply the various arguments.

def background_job(list_idx,A_LIST,B_LIST):
    str_out={}
    import os
    threads_list = [64]
    for imgs_size in list_idx:
        print "Data Set Size:=%s"%imgs_size
        #Subselect the smallest data size for now
        if imgs_size != "1k":
            print "Skipping %s"%imgs_size
            continue

        for imgs_idx in list_idx[imgs_size]:

            A_mat = A_LIST[imgs_idx]; b_vec = B_LIST[imgs_idx]
            dat_name = A_mat.split('/')[-1][2:-4]
            print "Data Set Name:=%s"%dat_name
            #Ensure the A and b data are matched correctly
            assert(os.path.dirname(A_mat) == os.path.dirname(b_vec))

            for threads in threads_list:
                print "OMP_NUM_THREADS:=%d"%threads

                val = !OMP_NUM_THREADS={threads} libtbx.python ./OMP_SOLVER.py {A_mat} {b_vec}
                key = 'omp' + str(threads) + '_' + dat_name
                str_out.update({key:val})
    return str_out

Applying the above function to the arguments allows us to grab a handle to the background process, to allow us to determine if and when it has finished. Calling str_out_result.ready() will return True when the funcction has returned, and False otherwise. This result is non-blocking. If however we aim to block until the result is returned, we can use str_out_result.get(), which returns the result for us in any case.

str_out_result=rc.apply_async(background_job,list_idx,A_LIST,B_LIST)
str_out_result.ready()
False
str_out_result.get()

—CLICK FOR OUTPUT—

[{'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Bfactor-': ['individual call time for STRUMPACK: CPU,  298.290s; elapsed,    5.134s',
   'individual call time for EIGEN: CPU,    2.620s; elapsed,    0.266s',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  298.290s; elapsed,    5.134s, #calls:   1',
   'time for                          EIGEN: CPU,    2.620s; elapsed,    0.266s, #calls:   1',
   'TOTAL                                  : CPU,  300.910s; elapsed,    5.400s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Bfactor-levmar.parameter_flags=Deff-': ['individual call time for STRUMPACK: CPU,  400.670s; elapsed,    7.057s',
   'individual call time for EIGEN: CPU,    1.000s; elapsed,    0.017s',
   'approx_equal eps: 1e-06',
   'approx_equal multiplier: 10000000000.0',
   '3.5182360272 approx_equal ERROR',
   '-3.76297519095 approx_equal ERROR',
   '',
   'Traceback (most recent call last):',
   '  File "./OMP_SOLVER.py", line 53, in <module>',
   '    run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)',
   '  File "./OMP_SOLVER.py", line 51, in run_solver',
   '    assert( approx_equal(res_strum.x[i], res_eig.x[i]) )',
   'AssertionError',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  400.670s; elapsed,    7.057s, #calls:   1',
   'time for                          EIGEN: CPU,    1.000s; elapsed,    0.017s, #calls:   1',
   'TOTAL                                  : CPU,  401.670s; elapsed,    7.073s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Bfactor-levmar.parameter_flags=Deff-levmar.parameter_flags=Eta-': ['individual call time for STRUMPACK: CPU,  332.840s; elapsed,    6.318s',
   'individual call time for EIGEN: CPU,    2.140s; elapsed,    0.035s',
   'approx_equal eps: 1e-06',
   'approx_equal multiplier: 10000000000.0',
   '3.5182360272 approx_equal ERROR',
   '-3.76297519095 approx_equal ERROR',
   '',
   'Traceback (most recent call last):',
   '  File "./OMP_SOLVER.py", line 53, in <module>',
   '    run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)',
   '  File "./OMP_SOLVER.py", line 51, in run_solver',
   '    assert( approx_equal(res_strum.x[i], res_eig.x[i]) )',
   'AssertionError',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  332.840s; elapsed,    6.318s, #calls:   1',
   'time for                          EIGEN: CPU,    2.140s; elapsed,    0.035s, #calls:   1',
   'TOTAL                                  : CPU,  334.980s; elapsed,    6.353s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Bfactor-levmar.parameter_flags=Eta-': ['individual call time for STRUMPACK: CPU,  280.900s; elapsed,    5.401s',
   'individual call time for EIGEN: CPU,    6.870s; elapsed,    0.275s',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  280.900s; elapsed,    5.401s, #calls:   1',
   'time for                          EIGEN: CPU,    6.870s; elapsed,    0.275s, #calls:   1',
   'TOTAL                                  : CPU,  287.770s; elapsed,    5.676s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Deff-': ['individual call time for STRUMPACK: CPU,  382.050s; elapsed,    6.886s',
   'individual call time for EIGEN: CPU,    0.710s; elapsed,    0.012s',
   'approx_equal eps: 1e-06',
   'approx_equal multiplier: 10000000000.0',
   '-0.527177455789 approx_equal ERROR',
   '-3.76297519095 approx_equal ERROR',
   '',
   'Traceback (most recent call last):',
   '  File "./OMP_SOLVER.py", line 53, in <module>',
   '    run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)',
   '  File "./OMP_SOLVER.py", line 51, in run_solver',
   '    assert( approx_equal(res_strum.x[i], res_eig.x[i]) )',
   'AssertionError',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  382.050s; elapsed,    6.886s, #calls:   1',
   'time for                          EIGEN: CPU,    0.710s; elapsed,    0.012s, #calls:   1',
   'TOTAL                                  : CPU,  382.760s; elapsed,    6.898s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Deff-levmar.parameter_flags=Eta-': ['individual call time for STRUMPACK: CPU,  340.390s; elapsed,    6.120s',
   'individual call time for EIGEN: CPU,    1.560s; elapsed,    0.026s',
   'approx_equal eps: 1e-06',
   'approx_equal multiplier: 10000000000.0',
   '-0.527177455789 approx_equal ERROR',
   '-3.76297519095 approx_equal ERROR',
   '',
   'Traceback (most recent call last):',
   '  File "./OMP_SOLVER.py", line 53, in <module>',
   '    run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)',
   '  File "./OMP_SOLVER.py", line 51, in run_solver',
   '    assert( approx_equal(res_strum.x[i], res_eig.x[i]) )',
   'AssertionError',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  340.390s; elapsed,    6.120s, #calls:   1',
   'time for                          EIGEN: CPU,    1.560s; elapsed,    0.026s, #calls:   1',
   'TOTAL                                  : CPU,  341.950s; elapsed,    6.145s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Eta-': ['individual call time for STRUMPACK: CPU,  277.570s; elapsed,    5.264s',
   'individual call time for EIGEN: CPU,    4.930s; elapsed,    0.081s',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  277.570s; elapsed,    5.264s, #calls:   1',
   'time for                          EIGEN: CPU,    4.930s; elapsed,    0.081s, #calls:   1',
   'TOTAL                                  : CPU,  282.500s; elapsed,    5.344s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Rxy-': ['individual call time for STRUMPACK: CPU,  438.160s; elapsed,    8.726s',
   'individual call time for EIGEN: CPU,    1.060s; elapsed,    0.017s',
   'approx_equal eps: 1e-06',
   'approx_equal multiplier: 10000000000.0',
   '-0.527177455789 approx_equal ERROR',
   '-3.76297519095 approx_equal ERROR',
   '',
   'Traceback (most recent call last):',
   '  File "./OMP_SOLVER.py", line 53, in <module>',
   '    run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)',
   '  File "./OMP_SOLVER.py", line 51, in run_solver',
   '    assert( approx_equal(res_strum.x[i], res_eig.x[i]) )',
   'AssertionError',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  438.160s; elapsed,    8.726s, #calls:   1',
   'time for                          EIGEN: CPU,    1.060s; elapsed,    0.017s, #calls:   1',
   'TOTAL                                  : CPU,  439.220s; elapsed,    8.743s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Bfactor-': ['individual call time for STRUMPACK: CPU,  353.900s; elapsed,    6.904s',
   'individual call time for EIGEN: CPU,    2.570s; elapsed,    0.042s',
   'approx_equal eps: 1e-06',
   'approx_equal multiplier: 10000000000.0',
   '3.5182360272 approx_equal ERROR',
   '-3.76297519095 approx_equal ERROR',
   '',
   'Traceback (most recent call last):',
   '  File "./OMP_SOLVER.py", line 53, in <module>',
   '    run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)',
   '  File "./OMP_SOLVER.py", line 51, in run_solver',
   '    assert( approx_equal(res_strum.x[i], res_eig.x[i]) )',
   'AssertionError',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  353.900s; elapsed,    6.904s, #calls:   1',
   'time for                          EIGEN: CPU,    2.570s; elapsed,    0.042s, #calls:   1',
   'TOTAL                                  : CPU,  356.470s; elapsed,    6.946s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Bfactor-levmar.parameter_flags=Deff-': ['individual call time for STRUMPACK: CPU,  552.200s; elapsed,   11.529s',
   'individual call time for EIGEN: CPU,    0.630s; elapsed,    0.030s',
   'approx_equal eps: 1e-06',
   'approx_equal multiplier: 10000000000.0',
   '3.5182360272 approx_equal ERROR',
   '-3.76297519095 approx_equal ERROR',
   '',
   'Traceback (most recent call last):',
   '  File "./OMP_SOLVER.py", line 53, in <module>',
   '    run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)',
   '  File "./OMP_SOLVER.py", line 51, in run_solver',
   '    assert( approx_equal(res_strum.x[i], res_eig.x[i]) )',
   'AssertionError',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  552.200s; elapsed,   11.529s, #calls:   1',
   'time for                          EIGEN: CPU,    0.630s; elapsed,    0.030s, #calls:   1',
   'TOTAL                                  : CPU,  552.830s; elapsed,   11.559s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Bfactor-levmar.parameter_flags=Deff-levmar.parameter_flags=Eta-': ['individual call time for STRUMPACK: CPU,  640.470s; elapsed,   13.635s',
   'individual call time for EIGEN: CPU,    3.100s; elapsed,    0.051s',
   'approx_equal eps: 1e-06',
   'approx_equal multiplier: 10000000000.0',
   '3.5182360272 approx_equal ERROR',
   '-3.76297519095 approx_equal ERROR',
   '',
   'Traceback (most recent call last):',
   '  File "./OMP_SOLVER.py", line 53, in <module>',
   '    run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)',
   '  File "./OMP_SOLVER.py", line 51, in run_solver',
   '    assert( approx_equal(res_strum.x[i], res_eig.x[i]) )',
   'AssertionError',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  640.470s; elapsed,   13.635s, #calls:   1',
   'time for                          EIGEN: CPU,    3.100s; elapsed,    0.051s, #calls:   1',
   'TOTAL                                  : CPU,  643.570s; elapsed,   13.686s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Bfactor-levmar.parameter_flags=Eta-': ['individual call time for STRUMPACK: CPU,  421.790s; elapsed,    7.949s',
   'individual call time for EIGEN: CPU,    2.670s; elapsed,    0.044s',
   'approx_equal eps: 1e-06',
   'approx_equal multiplier: 10000000000.0',
   '3.5182360272 approx_equal ERROR',
   '-3.76297519095 approx_equal ERROR',
   '',
   'Traceback (most recent call last):',
   '  File "./OMP_SOLVER.py", line 53, in <module>',
   '    run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)',
   '  File "./OMP_SOLVER.py", line 51, in run_solver',
   '    assert( approx_equal(res_strum.x[i], res_eig.x[i]) )',
   'AssertionError',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  421.790s; elapsed,    7.949s, #calls:   1',
   'time for                          EIGEN: CPU,    2.670s; elapsed,    0.044s, #calls:   1',
   'TOTAL                                  : CPU,  424.460s; elapsed,    7.993s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Deff-': ['individual call time for STRUMPACK: CPU,  505.820s; elapsed,    9.863s',
   'individual call time for EIGEN: CPU,    1.320s; elapsed,    0.022s',
   'approx_equal eps: 1e-06',
   'approx_equal multiplier: 10000000000.0',
   '-0.527177455789 approx_equal ERROR',
   '-3.76297519095 approx_equal ERROR',
   '',
   'Traceback (most recent call last):',
   '  File "./OMP_SOLVER.py", line 53, in <module>',
   '    run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)',
   '  File "./OMP_SOLVER.py", line 51, in run_solver',
   '    assert( approx_equal(res_strum.x[i], res_eig.x[i]) )',
   'AssertionError',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  505.820s; elapsed,    9.863s, #calls:   1',
   'time for                          EIGEN: CPU,    1.320s; elapsed,    0.022s, #calls:   1',
   'TOTAL                                  : CPU,  507.140s; elapsed,    9.884s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Deff-levmar.parameter_flags=Eta-': ['individual call time for STRUMPACK: CPU,  400.650s; elapsed,    7.382s',
   'individual call time for EIGEN: CPU,    2.570s; elapsed,    0.042s',
   'approx_equal eps: 1e-06',
   'approx_equal multiplier: 10000000000.0',
   '-0.527177455789 approx_equal ERROR',
   '-3.76297519095 approx_equal ERROR',
   '',
   'Traceback (most recent call last):',
   '  File "./OMP_SOLVER.py", line 53, in <module>',
   '    run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)',
   '  File "./OMP_SOLVER.py", line 51, in run_solver',
   '    assert( approx_equal(res_strum.x[i], res_eig.x[i]) )',
   'AssertionError',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  400.650s; elapsed,    7.382s, #calls:   1',
   'time for                          EIGEN: CPU,    2.570s; elapsed,    0.042s, #calls:   1',
   'TOTAL                                  : CPU,  403.220s; elapsed,    7.424s'],
  'omp64_strum_1k_omp1_paramslevmar.parameter_flags=Rxy-levmar.parameter_flags=Eta-': ['individual call time for STRUMPACK: CPU,  440.060s; elapsed,    8.054s',
   'individual call time for EIGEN: CPU,    2.350s; elapsed,    0.038s',
   'approx_equal eps: 1e-06',
   'approx_equal multiplier: 10000000000.0',
   '-0.527177455789 approx_equal ERROR',
   '-3.76297519095 approx_equal ERROR',
   '',
   'Traceback (most recent call last):',
   '  File "./OMP_SOLVER.py", line 53, in <module>',
   '    run_solver(n_rows, n_cols, A_indptr ,A_indices, A_values, b)',
   '  File "./OMP_SOLVER.py", line 51, in run_solver',
   '    assert( approx_equal(res_strum.x[i], res_eig.x[i]) )',
   'AssertionError',
   'Exiting profiler',
   'time for                      STRUMPACK: CPU,  440.060s; elapsed,    8.054s, #calls:   1',
   'time for                          EIGEN: CPU,    2.350s; elapsed,    0.038s, #calls:   1',
   'TOTAL                                  : CPU,  442.410s; elapsed,    8.093s']}]

str_out=str_out_result.get()
import json, cPickle
with open("OMP_SOLVER_TIME_1k_OMP64_ASYNC.json", "w") as OMP_SOLVER_TIME_FILE:
    OMP_SOLVER_TIME_FILE.write(json.dumps(str_out)) 
with open("OMP_SOLVER_TIME_1k_OMP64_ASYNC.pickle", "w") as OMP_SOLVER_TIME_FILE:
    OMP_SOLVER_TIME_FILE.write(cPickle.dumps(str_out))

We can now submit a large background job, and close the window until it has completed. This will run asynchronously on the cluster backend, and allow us to poll it upon our return.

def background_job_10k(list_idx,A_LIST,B_LIST):
    str_out={}
    import os
    threads_list = [1,2,4,8,16,32]
    for imgs_size in list_idx:
        print "Data Set Size:=%s"%imgs_size
        #Subselect the smallest data size for now
        if imgs_size != "10k":
            print "Skipping %s"%imgs_size
            continue

        for imgs_idx in list_idx[imgs_size]:

            A_mat = A_LIST[imgs_idx]; b_vec = B_LIST[imgs_idx]
            dat_name = A_mat.split('/')[-1][2:-4]
            print "Data Set Name:=%s"%dat_name
            #Ensure the A and b data are matched correctly
            assert(os.path.dirname(A_mat) == os.path.dirname(b_vec))

            for threads in threads_list:
                print "OMP_NUM_THREADS:=%d"%threads

                val = !OMP_NUM_THREADS={threads} libtbx.python ./OMP_SOLVER.py {A_mat} {b_vec}
                key = 'omp' + str(threads) + '_' + dat_name
                str_out.update({key:val})
    return str_out
str_out_result=rc.apply_async(background_job_10k,list_idx,A_LIST,B_LIST)
str_out_result.ready()
False