Extended Eigensolver Routines: Strange eigenvectors

November 29, 2014, 4:55 am

Latest and popular articles on Intel Technologies

≫ Next: error #6284: There is no matching specific function for this generic function reference. [DFTICOMPUTEFORWARD]

Hello,

I am currently trying to solve the eigenproblem Av = λv where A is a complex hermitian matrix using the MKL feast implementation. For testing purposes I constructed the following example (using C++):

#include <mkl.h>
#include <iostream>
#include <vector>
#include <complex>

int main() {
	using namespace std;

	int fpm[128]{ };
	::feastinit(fpm);

/* A =
0	2-i	1
2+i	0	0
1	0	0
*/

	vector<complex<double>> entries = { complex<double>(2, -1), 1, complex<double>(2, +1), 1 };
	vector<int> cols = { 2, 3, 1, 1 };
	vector<int> rows = { 1, 3, 4, 5 };

	char uplo = 'F';
	double eps = 0;
	int loop = 0;
	double emin = -4;
	double emax = 4;
	int m0 = 3;
	vector<complex<double>> eigenvectors(m0 * m0);
	vector<double> eigenvalues(m0);
	vector<double> res(m0);
	int mode = 0;
	int info = 0;

	zfeast_hcsrev(&uplo, &m0, reinterpret_cast<MKL_Complex16*>(&entries[0]), &rows[0], &cols[0],
				  fpm, &eps, &loop, &emin, &emax, &m0, &eigenvalues[0], reinterpret_cast<MKL_Complex16*>(&eigenvectors[0]), &mode, &res[0], &info);

}

The eigenvalues of the matrix are 0 and ±square root 6 ≈ ±2.44948.

MKL produces this exact eigenvalues. However, the eigenvectos are strange. For example, the corresponding eigenvector for +sqrt(6) is (already normalized):

1/sqrt(2), 1/sqrt(2) + i/(2*sqrt(3)), 1/(2*sqrt(3))

or alternatively

0.707106, 0.577350 + i*0.28867, 0.28867

However, MKL produces:

0.48890728596802574+i*0.51085190195141617
0.19063671173166902+i*0.61670439499553087
0.19959556369177356+i*0.20855441565187854

What's the matter with these? What am I doing wrong?

Thank you.

↧

error #6284: There is no matching specific function for this generic function reference. [DFTICOMPUTEFORWARD]

November 29, 2014, 6:39 pm

Latest and popular articles on Intel Technologies

≫ Next: Fast Discrete Fourier Transform with MKL

≪ Previous: Extended Eigensolver Routines: Strange eigenvectors

Hi!

I have the questions:I want to use the FFT, but now the errors appeared, I don't know how to deal whit them .Can you tell me the reasons of the errors,and tell me how to deal with them.

Thank you !

↧

Fast Discrete Fourier Transform with MKL

November 30, 2014, 5:25 pm

Latest and popular articles on Intel Technologies

≫ Next: mkldcsrbsr gives core dump when trying to fill all arrays

≪ Previous: error #6284: There is no matching specific function for this generic function reference. [DFTICOMPUTEFORWARD]

Hi all,

In R programming, there is the "fft" function: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/fft.html

> x <- c(102, 55, 89, 12, 3, 45, 9)> fft(x)
[1] 315.00000+ 0.00000i  98.57101-82.76603i -23.61882-18.71932i
[4] 124.54781+ 5.66758i 124.54781- 5.66758i -23.61882+18.71932i
[7]  98.57101+82.76603i
> Re(fft(x))
[1] 315.00000  98.57101 -23.61882 124.54781 124.54781 -23.61882  98.57101

What I have to is to replicate the Re(fft(x)) output in Fortran. I wonder if Intel MKL could do the job.

! testfft.f90

program myfft

 implicit none
 integer, dimension(7) :: x
 double precision, dimension(7) :: Refftx

 x = (/ 102, 55, 89, 12, 3, 45, 9 /)

 ! ============================================================ !
 ! Here, I need to use a fft function to return the result in R !
 ! ============================================================ !
 
 ! For example,
 
 Refftx = theFFTfunctionIdonotKnow(x)

 print*, Refftx

end program myfft

How do I have to build/compile the "testfft.f90" file?

The machine I use is equipped with Intel Parallel Studio XE 2013. I generally compile .f90 files by typing ifort myfile.f90 on the Intel 64 Visual Studio 2008 mode command prompt. The machine runs Window 7 64 bit.

Thank you very much!

↧

mkldcsrbsr gives core dump when trying to fill all arrays

December 1, 2014, 5:47 am

Latest and popular articles on Intel Technologies

≫ Next: Incosistent in-place and out-of-place results in mkl_dfti

≪ Previous: Fast Discrete Fourier Transform with MKL

I am using the mkldcsrbsr routine. I ma using MKL version 10.3 (i have tried with version 11.0 as well). It seems i can call the routine perfectly for the job type = -1. I get the number of blocks correctly but when i call the routine again to fill all the arrays i get a coredump.

Here is the code i am using

rowsAbsr = new int[N_A+1];
job[0]=0;//CSR to BSR
job[1]=0;// zero based CSR
job[2]=0;//zero based bsr
job[3]=0;
job[4]=0;
job[5]=-1;//
mkl_dcsrbsr(job, &N_A, &m, &ldabsr, nnzsAcsr, colsAcsr, rowsAcsr, nnzsAbsr, colsAbsr, rowsAbsr, &info);
cout<<"The conversion to BSR for matrix A is "<<info<<"."<<endl;
cout<<"The request for number of blocks of BSR for matrix A is "<<rowsAbsr[0]<<"."<<endl;
sizecolsabsr=rowsAbsr[0];
colsAbsr = new int[rowsAbsr[0]];
nnzsAbsr = new double[m*m*(rowsAbsr[0])];
ldabsr=m*m*(rowsAbsr[0]);
    cout<<"ldabsr is calculated as "<<ldabsr<<"."<<endl;
    job[0]=0;//CSR to BSR
    job[1]=0;// zero based CSR
    job[2]=0;//zero based bsr
    job[5]=0;// only row and column arrays for bsr are filled
    mkl_dcsrbsr(job, &N_A, &m, &ldabsr, nnzsAcsr, colsAcsr, rowsAcsr, nnzsAbsr, colsAbsr, rowsAbsr, &info);
    cout<<"The conversion to BSR for matrix A is "<<info<<"."<<endl;
    job[0]=0;//CSR to BSR
    job[1]=0;// zero based CSR
    job[2]=0;//zero based bsr
    job[5]=1;// all arays are filled
    mkl_dcsrbsr(job, &N_A, &m, &ldabsr, nnzsAcsr, colsAcsr, rowsAcsr, nnzsAbsr, colsAbsr, rowsAbsr, &info);
    cout<<"The conversion to BSR for matrix A is "<<info<<"."<<endl;

I had my doubts on ldabsr but as per the documentation available online it seems i am doing the right thing. Could you be so kind to suggest if you spot any obvious error here?

with kind regards

rohit

↧

Incosistent in-place and out-of-place results in mkl_dfti

December 1, 2014, 8:25 am

Latest and popular articles on Intel Technologies

≫ Next: How to configure properly for cluster PARDISO in WinServer 2012?

≪ Previous: mkldcsrbsr gives core dump when trying to fill all arrays

Hi,

I am trying to transfer from out-of-place calculation to in-place using MKL 10.3.12. I don't see any problem when doing forward FFT. However in backward FFT, for dimensions larger than 4, I get inconsistent results for in-place and out-of-place calculations. This happens when I use backward scaling of 1.0 (which I need in my problem), and the issue is resolved when scaling of 1/(K1*K2*K3) is used instead!. I have attached a minimal code for reproducing the results. I compiled it with:

gfortran -fcray-pointer -I$myMKLINC main.f90 -L$MKLROOT/lib/intel64/ -L/opt/intel/composer_xe_2011_sp1.12.361/compiler/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -liomp5 -o amain

Thanks for any help

Amir

Attachment	Size
Download main.f90	8.16 KB

↧

How to configure properly for cluster PARDISO in WinServer 2012?

December 1, 2014, 12:53 pm

Latest and popular articles on Intel Technologies

≫ Next: -mkl gives different results from -lfftw3

≪ Previous: Incosistent in-place and out-of-place results in mkl_dfti

We are running an application with (currently) the non-cluster PARDISO under Windows Server 2012 SP1, the machine is a HP DL980 with 160 logical cores (80 cores with HT turned on) and 2TB RAMs. We are having problems in utilizing all cores, most likely originated from the Windows "Core Group" limit of 64 cores each. I am happy to see now Intel offers the clustered version of PERDISO in MKL. Can this new version now handle multiple core groups as clusters of computers? Any special configuration requirement to use the new PARDISO? Or we have to reconfigure the machines with the clustering SW? Do I need to update my Fortran Composer license to the Clustered edition, or an update of current professional edition will do?

↧

-mkl gives different results from -lfftw3

December 2, 2014, 11:30 pm

Latest and popular articles on Intel Technologies

≫ Next: about parallelism on BLAS level-1 routines and VML

≪ Previous: How to configure properly for cluster PARDISO in WinServer 2012?

I am developing a time-stepping code that calls fft routines in every step. While writing and testing the code, I used the -lfftw3 flag to link to the fftw3 library. Now that the code is functional, I tried to link to the MKL version of this library instead, as I think it may be faster. However, the result is completely different. With the -lfftw3 flag, the output seems to make sense, but not with the -mkl option. I am hoping that somebody can explain the difference. This is of great importance to me, as I often use fftw3, lapack and similar libraries, and it seems that MKL gives the best performance.

Operating system: ubuntu 13.10, but it happens on our cluster, too.
Hardware: Lenovo laptop with Intel(R) Core(TM) i7-4600U CPU, but it happens on our cluster, too.
Ifort version: Version 12.1.0.233 Build 20110811 (called through mpif90 with OMPI_FC=ifort)

> mpif90 -o test.x LES_cont.f90 -llapack -lfftw3
>mpirun -np 1 ./test.x
( ... computation ...)
Residual= 3.237650346704463E-012

>mpif90 -mkl -o test.x LES_cont.f90
>mpirun -np 1 ./test.x
(... computation...)
Residual= 2.33661403394626

Between the two runs i change only the complier/linker options as shown, nothing else. Thanks in advance for your help!

↧

about parallelism on BLAS level-1 routines and VML

December 4, 2014, 8:19 pm

Latest and popular articles on Intel Technologies

≫ Next: occured linking errors while using CLUSTER_SPARSE_SOLVER

≪ Previous: -mkl gives different results from -lfftw3

Hi all,

I am running BLAS routines in MKL with intel compiler (icpc). Following the example given in the compiler, I try to set the numbers of threads from 1 to 10 while running dgemm routine for matrix-matrix multiplication and I saw the speedup while increasing the number of threads. However, for level-1 routines (e.g. cblas_zcopy, cblas_zaxpby), I didn't see any speed up for multithreading version. I wonder if there is any multi-threading version for level-1 routines or not? What about the VML routines? I also try to use those routines (e.g. vzExp, vzMul) but no speedup at all in multithreading environment.

↧

occured linking errors while using CLUSTER_SPARSE_SOLVER

December 5, 2014, 2:36 am

Latest and popular articles on Intel Technologies

≫ Next: [REQUEST] Looking for old MKL Versions

≪ Previous: about parallelism on BLAS level-1 routines and VML

Hello Everyone,

While using 'CLUSTER_SPARSE_SOLVER' for solving sparse matrix, I got linking errors. I have included "mkl_cluster_sparse_solver.h" as well as "mpi.h" files also. What should I do next?

Thanks in advance.

Mayur

BUILD LOG :-

1>------ Build started: Project: MKLWrapper, Configuration: Release x64 ------
1>Build started 12/5/2014 3:53:24 PM.
1>InitializeBuildStatus:
1> Touching "..\..\Obj\x64\Release\MKLWrapper\MKLWrapper.unsuccessfulbuild".
1>ClCompile:
1> All outputs are up-to-date.
1> All outputs are up-to-date.
1> MKLWrapper.cpp
1>Link:
1> Creating library ..\..\Obj\x64\Release\MKLWrapper\..\MKLWrapper.lib and object ..\..\Obj\x64\Release\MKLWrapper\..\MKLWrapper.exp
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Barrier
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Comm_rank
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Comm_size
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Irecv
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Recv
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Isend
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Send
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Test
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Bcast
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Comm_split
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Scatterv
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Gatherv
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Allgather
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Reduce
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Alltoall
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Alltoallv
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Comm_free

↧

[REQUEST] Looking for old MKL Versions

December 5, 2014, 10:19 am

Latest and popular articles on Intel Technologies

≫ Next: FFT-Based 3D Convolution With Zero Padding

≪ Previous: occured linking errors while using CLUSTER_SPARSE_SOLVER

Hi,

I'm searching for some older MKL versions, but am having trouble locating them. Does anyone have (or know where I can find) some/any of these? Thanks in advance

l_mkl_p_8.0.019.tgz
l_mkl_p_8.0.1.006.tgz
l_mkl_p_8.1.014.tgz
l_mkl_p_8.1.1.004.tgz
l_mkl_p_9.0.018.tgz
l_mkl_p_9.1.023.tgz

↧

FFT-Based 3D Convolution With Zero Padding

December 5, 2014, 4:01 pm

Latest and popular articles on Intel Technologies

≫ Next: trsm memory leak

≪ Previous: [REQUEST] Looking for old MKL Versions

I have been trying to figure out how I can use Intel MKL to perform a FFT-based 3D convolution with zero-padding. I have been searching and posting in online forums (including Intel MKL forum), unfortunately, I have not been very successful so far.

I have a 3D array and is stored as a 1D array of type double in a columnwise fashion. Similarly the kernel is of type double and is saved columnwise. For example,

for( int k = 0; k < nk; k++ ) // Loop through the height.
    for( int j = 0; j < nj; j++ ) // Loop through the rows.
        for( int i = 0; i < ni; i++ ) // Loop through the columns.
        {
            ijk = i + ni * j + ni * nj * k;
            my3Darray[ ijk ] = 1.0;
        }

I am writing a 3D convolution function:

which takes in real values (not complex values) and
outputs the results of the convolution,
for the computation of convolution, I am performing a "not-in-place" FFT on the input array as well as the kernel in order to prevent them from getting modified (I need to use them later in my code) and
then do the backward computation "in-place".

During the process I am also considering the zero padding to avoid any artifacts. The size of FFTs are (dim_input+dim_kernel-1) on each dimension and the next highest power of two is chosen for speed.

My questions are:

How can I perform the zero-padding?
How should I deal with the size of the arrays used by FFT functions?
How can I take out the zero padded results and get the actual result?

I would be absolutely grateful to have any comments or suggestion.

#include "mkl.h"

int max(int a, int b, int c);
void Conv3D_R2C(
    double *in, int nRowsIn , int nColsIn , int nHeightsIn ,
    double *ker, int nRowsKer, int nColsKer, int nHeightsKer,
    double *out );

int main()
{

    int n = 5;
    int nkernel = 3;

    double *a          = new double [n*n*n]; // This array is real.
    double *aconvolved = new double [n*n*n]; // The convolved array is also real.
    double *kernel     = new double [nkernel*nkernel*nkernel]; // kernel is real.

    // Fill the array with some 'real' numbers.
    for( int i = 0; i < n*n*n; i++ )
        a[ i ] = 1.0;

    // Fill the kernel with some 'real' numbers.
    for( int i = 0; i < nkernel*nkernel*nkernel; i++ )
        kernel[ i ] = 1.0;

    // Calculate the convolution.
    Conv3D_R2C( a, n, n, n, kernel, nkernel, nkernel, nkernel, aconvolved );

    delete[] a;
    delete[] kernel;
    delete[] aconvolved;
}

void Conv3D_R2C( // Real to Complex 3D FFT.
    double *in, int nRowsIn , int nColsIn , int nHeightsIn ,
    double *ker, int nRowsKer, int nColsKer, int nHeightsKer,
    double *out )
{

    int nIn  = max( nRowsIn , nColsIn , nHeightsIn  );
    int nKer = max( nRowsKer, nColsKer, nHeightsKer );
    int n = nIn + nKer - 1;

    /* Strides describe data layout in real and conjugate-even domain. */
    MKL_LONG rs[4], cs[4];

    // DFTI descriptor.
    DFTI_DESCRIPTOR_HANDLE fft_desc = 0;

    // Round up to the next highest power of 2.
    unsigned int N = (unsigned int) n; // compute the next highest power of 2 of 32-bit n.
    N--;
    N |= N >> 1;
    N |= N >> 2;
    N |= N >> 4;
    N |= N >> 8;
    N |= N >> 16;
    N++;

    // Variables needed for out-of-place computations.
    MKL_Complex16 *in_fft  = new MKL_Complex16 [ N*N*N ];
    MKL_Complex16 *ker_fft = new MKL_Complex16 [ N*N*N ];
    MKL_Complex16 *out_fft = new MKL_Complex16 [ N*N*N ];
    double *out2 = new double [ N*N*N ];

    /* Compute strides */
    rs[3] = 1;           cs[3] = 1;
    rs[2] = (N/2+1)*2;   cs[2] = (N/2+1);
    rs[1] = N*(N/2+1)*2; cs[1] = N*(N/2+1);
    rs[0] = 0;           cs[0] = 0;

    // Create DFTI descriptor.
    MKL_LONG sizes[] = { N, N, N };
    DftiCreateDescriptor( &fft_desc, DFTI_DOUBLE, DFTI_REAL, 3, sizes );

    // Configure DFTI descriptor.
    DftiSetValue        ( fft_desc, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX );
    DftiSetValue        ( fft_desc, DFTI_PLACEMENT             , DFTI_NOT_INPLACE     ); // Out-of-place transformation.
    DftiSetValue        ( fft_desc, DFTI_INPUT_STRIDES  , rs  );
    DftiSetValue        ( fft_desc, DFTI_OUTPUT_STRIDES , cs  );
    DftiCommitDescriptor( fft_desc );
    DftiComputeForward  ( fft_desc, in , in_fft  );
    DftiComputeForward  ( fft_desc, ker, ker_fft );

    for(long long i = 0; i < (long long)N*N*N; i++ )
    {
        out_fft[i].real = in_fft[i].real * ker_fft[i].real;
        out_fft[i].imag = in_fft[i].imag * ker_fft[i].imag;
    }

    // Change strides to compute backward transform.
    DftiSetValue        ( fft_desc, DFTI_INPUT_STRIDES , cs);
    DftiSetValue        ( fft_desc, DFTI_OUTPUT_STRIDES, rs);
    DftiCommitDescriptor( fft_desc );
    DftiComputeBackward ( fft_desc, out_fft, out2 );

    // Printing the zero padded 3D convolved result.
    for( long long i = 0; i < (long long)N*N*N; i++ )
        printf( out2, N*N*N );

    /* I don't know how to take out the zero padded results and
       save the actual result in the variable named "out" */

    DftiFreeDescriptor  ( &fft_desc );

    delete[] in_fft;
    delete[] ker_fft;
    delete[] out2;
}

int max(int a, int b, int c)
{
     int m = a;
     (m < b) && (m = b); //these are not conditional statements.
     (m < c) && (m = c); //these are just boolean expressions.
     return m;
}

↧

trsm memory leak

December 6, 2014, 4:34 am

Latest and popular articles on Intel Technologies

≫ Next: [Scalapack] Please Help with using pdgesv

≪ Previous: FFT-Based 3D Convolution With Zero Padding

Hello,

When runing valgrind on "source/cblas_dtrsmx.out" compiled from within "mkl/examples/cblas", I end up with a log similar to:

==9740== 3,131,264 bytes in 1 blocks are still reachable in loss record 7 of 7
==9740== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==9740== by 0x60EAFB4: mkl_serv_allocate (in /opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_core.so)
==9740== by 0x9269DBE: mkl_blas_mc3_dgemm_get_bufs (in /opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_mc3.so)
==9740== by 0x9242B7A: mkl_blas_mc3_xdtrsm (in /opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_mc3.so)
==9740== by 0x4F466C2: DTRSM (in /opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_lp64.so)
==9740== by 0x4F60E0C: cblas_dtrsm (in /opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_lp64.so)
==9740== by 0x4016D5: main (in /opt/intel/composer_xe_2015.1.133/mkl/examples/cblas/_results/intel_lp64_sequential_intel64_so/cblas_dtrsmx.out)

Is that normal to have that much memory leaked ?

Thank you in advance.

↧

[Scalapack] Please Help with using pdgesv

December 9, 2014, 3:48 am

Latest and popular articles on Intel Technologies

≫ Next: complex system GETRF+GETRS

≪ Previous: trsm memory leak

Hello all:

I'm trying to solve a linear system (9 by 9 full matrix) by pdgesv in c. I use the example code (http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=1683&sid=26b4f253...) and compiling is ok. However, there is a error info after calling pdgesv:

“On entry to PDGESV parameter number 602 had an illegal value”.

According to the PDGESV source code, it means the 2th element of the 6th argument was wrong, which means the descA[1] is wrong?

However, descA[1] (ictxt) is rewrited by Cblacs_gridinit(&ictxt, "Row", nprow, npcol).

after printf ictxt, I find all ictxt = 0. IS this reason lead to the error message?

I really need to solve a large dense(full) matrix, can someone help me or give me some advice?

The following is my C code:

PS: I've called MPI_Init(&argc,&argv) before entering Scalapack(int argc, char ** argv) and called MPI_Finalize() after leaving Scalapack(int argc, char ** argv).

#include <mpi.h>
#include <iostream>
#include <vector>
#include <cmath>
#include <cstdlib>
#include "Scalapack.h"
#include <mkl.h>
#include <mkl_scalapack.h>
#include "mkl_lapacke.h"
#include <mkl_cblas.h>

#define mat(matriz,coluna,i,j) (matriz[i*coluna+j])

#define p_of_i(i,bs,p) ( MKL_INT((i-1)/bs)%p)
#define l_of_i(i,bs,p) ( MKL_INT((i-1)/(p*bs)))
#define x_of_i(i,bs,p) (((i-1)%bs)+1)

#define numroc_ NUMROC

using namespace std;

extern "C"
{
   /* BLACS C interface */
   void Cblacs_pinfo(int* mypnum, int* nprocs);
   void Cblacs_get( MKL_INT context, MKL_INT request, MKL_INT* value);
   int Cblacs_gridinit( MKL_INT* context, char * order, MKL_INT np_row, MKL_INT np_col);
   void Cblacs_gridinfo( MKL_INT context, MKL_INT* np_row, MKL_INT* np_col, MKL_INT* my_row,
   MKL_INT* my_col);
   int numroc_( MKL_INT *n, MKL_INT *nb, MKL_INT *iproc, MKL_INT *isrcproc, MKL_INT *nprocs);
   void Cblacs_gridexit(MKL_INT ictxt);
   void Cblacs_barrier(MKL_INT ictxt, char * order);
}

void find_nps(MKL_INT np, MKL_INT &nprow, MKL_INT & npcol);
int getIndex(MKL_INT row, MKL_INT col,MKL_INT NCOLS) {return row*NCOLS+col;}

CTEST_Scalapack::CTEST_Scalapack(void)
{
}

CTEST_Scalapack::~CTEST_Scalapack(void)
{
}

int CTEST_Scalapack::Scalapack(int argc, char ** argv)
{

int nprocs = 0;//MPI::COMM_WORLD.Get_size();
int rank = 0;//MPI::COMM_WORLD.Get_rank();

MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);

std::cout<<"Returned: "<<"";
std::cout << "Hello World! I am "<< rank << " of "<< nprocs <<
std::endl;

   srand(1);
   MKL_INT myrow=0;
   MKL_INT mycol=0;
   MKL_INT ictxt=0;
   MKL_INT nprow=0,npcol=0;

MKL_INT BLOCK_SIZE =2; //this gonna be tricky - should be 64, but cannot be larger than the original matrix

   MKL_INT locR=0, locC=0;
   MKL_INT block = BLOCK_SIZE;
   MKL_INT izero = 0;
   MKL_INT matrix_size = 9;

   MKL_INT myone = 1;

   MKL_INT nrhs = 1;

   MKL_INT info=0;

   int i=0,j=0;
   double mone=(-1.e0),pone=(1.e0);
   double AnormF=0.e0, XnormF=0.e0, RnormF=0.e0, BnormF=0.e0, residF=0.e0,eps=0.e0;

find_nps(nprocs,nprow,npcol);

   Cblacs_pinfo( &rank, &nprocs ) ;
   Cblacs_get(-1, 0, &ictxt);
   Cblacs_gridinit(&ictxt, "Row", nprow, npcol);
   Cblacs_gridinfo(ictxt, &nprow, &npcol, &myrow, &mycol);

   locR = numroc_(&matrix_size, &block, &myrow, &izero, &nprow);
   locC = numroc_(&matrix_size, &block, &mycol, &izero, &npcol);

   ////GLOBAL
   double * A = new double[matrix_size*matrix_size]();
   double * B = new double[matrix_size]();
   double * Acpy = new double[matrix_size*matrix_size]();
   double * Bcpy = new double[matrix_size]();

   //LOCAL
   double * local_know_vector = new double[locR]();
   double * local_matrix = new double[locR*locC]();

   MKL_INT* ipiv = new MKL_INT [locC*locR*block+1000000]();

   B[2] = 1;
   B[3] = 0;
   B[4] = 0;
   B[5] = 0;



   A[0] = 19;
   A[1] = 3;
   A[2] = 1;
   A[3] = 12;
   A[4] = 1;
   A[5] = 16;
   A[6] = 1;
   A[7] = 3;
   A[8] = 11;

   A[9] = -19;
   A[10] = 3;
   A[11] = 1;
   A[12] = 12;
   A[13] = 1;
   A[14] = 16;
   A[15] = 1;
   A[16] = 3;
   A[17] = 11;

   A[18] = -19;
   A[19] = -3;
   A[20] = 1;
   A[21] = 12;
   A[22] = 1;
   A[23] = 16;
   A[24] = 1;
   A[25] = 3;
   A[26] = 11;

   A[27] = -19;
   A[28] = -3;
   A[29] = -1;
   A[30] = 12;
   A[31] = 1;
   A[32] = 16;
   A[33] = 1;
   A[34] = 3;
   A[35] = 11;

   A[36] = -19;
   A[37] = -3;
   A[38] = -1;
   A[39] = -12;
   A[40] = 1;
   A[41] = 16;
   A[42] = 1;
   A[43] = 3;
   A[44] = 11;

   A[45] = -19;
   A[46] = -3;
   A[47] = -1;
   A[48] = -12;
   A[49] = -1;
   A[50] = 16;
   A[51] = 1;
   A[52] = 3;
   A[53] = 11;

   A[54] = -19;
   A[55] = -3;
   A[56] = -1;
   A[57] = -12;
   A[58] = -1;
   A[59] = -16;
   A[60] = 1;
   A[61] = 3;
   A[62] = 11;

   A[63] = -19;
   A[64] = -3;
   A[65] = -1;
   A[66] = -12;
   A[67] = -1;
   A[68] = -16;
   A[69] = -1;
   A[70] = 3;
   A[71] = 11;

   A[72] = -19;
   A[73] = -3;
   A[74] = -1;
   A[75] = -12;
   A[76] = -1;
   A[77] = -16;
   A[78] = -1;
   A[79] = -3;
   A[80] = 11;

   MKL_INT* descA = new MKL_INT[9]();
   MKL_INT* descB = new MKL_INT[9]();

   descA[0] = 1; // descriptor type
   descA[1] = ictxt; // blacs context
   descA[2] = matrix_size; // global number of rows
   descA[3] = matrix_size; // global number of columns
   descA[4] = block; // row block size
   descA[5] = block; // column block size (DEFINED EQUAL THAN ROW BLOCK SIZE)
   descA[6] = 0; // initial process row(DEFINED 0)
   descA[7] = 0; // initial process column (DEFINED 0)
   descA[8] = locR; // leading dimension of local array

   descB[0] = 1; // descriptor type
   descB[1] = ictxt; // blacs context
   descB[2] = matrix_size; // global number of rows
   descB[3] = 1; // global number of columns
   descB[4] = block; // row block size
   descB[5] = block; // column block size (DEFINED EQUAL THAN ROW BLOCK SIZE)
   descB[6] = 0; // initial process row(DEFINED 0)
   descB[7] = 0; // initial process column (DEFINED 0)
   descB[8] = locR; // leading dimension of local array

   int il=0, jl=0;
   for(i=1; i< matrix_size+1; i++)
   {
   for(j=1; j< matrix_size+1; j++)
   {

   int pi = p_of_i(i,block,nprow);

   int li = l_of_i(i,block,nprow);

   int xi = x_of_i(i,block,nprow);
       //printf("i = %d, j = %d, pi = %d, li = %d\n",i,j,pi,li);;fflush(stdout);
   int pj = p_of_i(j,block,npcol);

   int lj = l_of_i(j,block,npcol);

   int xj = x_of_i(j,block,npcol);
       //printf("i = %d, j = %d, pj = %d, lj = %d, xj = %d\n",i,j,pj,lj,xj);;fflush(stdout);

   if( (pi == myrow) && (pj == mycol))
       {
           il = li*block+xi;
           jl = lj*block+xj;
           local_matrix[getIndex(il-1, jl-1, locC)] = A[getIndex(i-1,j-1,matrix_size)];
   }

   if( (pi == myrow) &&(mycol==0) )
       {
           local_know_vector[il-1] = B[i-1];
   }

   }

   }

   ////STARTING PDGESV
   pdgesv_(&matrix_size, &nrhs, local_matrix, &myone, &myone, descA, ipiv, local_know_vector, &myone, &myone, descB, &info);

   if(rank==0)
   {
   if(info != 0) cout <<"PDGESV problem! Info "<<info<<endl;
   }


   for(i=0; i< locR; i++)
   {
   cout<<"**\n"<<"rank "<<rank<<" answer: "<<local_know_vector[i]<<endl;
   }

   if(NULL!=descA)                   {delete [] descA; descA=NULL;}
   if(NULL!=descB)                   {delete [] descB; descB=NULL;}
   if(NULL!=local_know_vector)           {delete [] local_know_vector; local_know_vector=NULL;}
   if(NULL!=local_matrix)               {delete [] local_matrix; local_matrix=NULL;}
   if(NULL!=Acpy)                       {delete [] Acpy; Acpy=NULL;}
   if(NULL!=Bcpy)                       {delete [] Bcpy; Bcpy=NULL;}
   if(NULL!=A)                           {delete [] A; A=NULL;}
   if(NULL!=B)                           {delete [] B; B=NULL;}

Cblacs_gridexit(ictxt);

return 0;

}

void find_nps(MKL_INT np, MKL_INT &nprow, MKL_INT & npcol)
{

MKL_INT min_nprow=100000;
MKL_INT min_npcol=100000;

nprow = np;
npcol = np;

while(1) {

npcol--;
if(np%2==0 ) {
if(npcol ==1){
nprow --;
npcol = nprow;
}
}else {
if(npcol ==0){
nprow --;
npcol = nprow;
}

}

if(nprow*npcol == np) {
min_npcol = npcol;
if(nprow < min_nprow) min_nprow = nprow;
}

if(nprow ==1 ) break;

}

nprow = min_nprow;
npcol = min_npcol;

}

↧

complex system GETRF+GETRS

December 9, 2014, 5:52 am

Latest and popular articles on Intel Technologies

≫ Next: Sparse-Sparse Matrix Multiplication

≪ Previous: [Scalapack] Please Help with using pdgesv

Dear all,

I would like to solve a linear complex system with MKL libraries. As I have done with real system I use GETRF with GETRS. The MKL reference says that I can also use getrs also for comple system. Here my example code:

program testmkl
use LAPACK95
implicit none
complex    ,allocatable,dimension(:,:)::AA
complex    ,allocatable,dimension(:)  ::BB
integer    ,allocatable,dimension(:)::IPV
integer :: info,n

n=10
allocate(AA(n,n))
allocate(BB(n))
allocate(IPV(n))

 call GETRF(AA,IPV,info)
 call GETRS(AA,IPV,BB,info)

endprogram

However, I am not able to compile it. This is my error:

There is no matching specific subroutine for this generic subroutine call.   [GETRS]
 call GETRS(AA,IPV,BB,info)

Where that am i wrong?

Thanks

↧

Sparse-Sparse Matrix Multiplication

December 9, 2014, 11:28 am

Latest and popular articles on Intel Technologies

≫ Next: Different results using 11.2.1 vs 11.1.0

≪ Previous: complex system GETRF+GETRS

I have used mkl_dcsrmultcsr in my research. However, it is performing double pass to compute sparse*sparse matrix product. For small size problems, this is not a problem, however for large size problems (e.g. matrices of size ½ billion by ½ billion) this is time consuming and it would be better if MKL can do this multiplication in a single pass in a parallel setup.

Most (if not all) of the sparse*sparse CSR matrix multiplications algorithm use Gustavson Algorithm (ACM 1978) and there is no reason why this algorithm cannot be parallelized and do calculations in a single pass. I understand that the performance of a single pass parallelization would depend on pre-allocating the space non-zero values, which I think can be reasonably given in most situations and even if this does not work the algorithm should be able to adjust the buffer size (if required).

Similarly, it would be useful to only compute lower/upper triangular portion of the output matrix (of course the output matrix have to be symmetric).

Application domain: Statistics, PDE’s, Inverse Problems, Weather Prediction.

Thanks

Vineet

↧

Different results using 11.2.1 vs 11.1.0

December 9, 2014, 5:05 pm

Latest and popular articles on Intel Technologies

≫ Next: Diverge in Newton Method

≪ Previous: Sparse-Sparse Matrix Multiplication

I have the identical SPD matrix and rhs but got the different solution using mkl 11.2.1 on win vs 11.1.0 on linux.

on win

Major version: 11
Minor version: 2
Update version: 1
Product status: Product
Build: n20141023
Processor optimization: Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors

on linux

Major version: 11
Minor version: 1
Update version: 0
Product status: Product
Build: n20130711
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) Enabled Processor

A related question: we want to use intel 15.0. It seems we must use mkl11.2. Is that true?

Attachment	Size
Download PARDISO_rhs_win.txt	146.13 KB
Download PARDISO_matrix_win.txt	19.35 MB
Download PARDISO_sol_win.txt	328.57 KB
Download PARDISO_matrix_linux.txt	19.35 MB
Download PARDISO_rhs_linux.txt	146.13 KB
Download PARDISO_sol_linux.txt	321.1 KB

↧

Diverge in Newton Method

December 11, 2014, 2:01 am

Latest and popular articles on Intel Technologies

≫ Next: Basic Code of Using MKL FFT on MIC

≪ Previous: Different results using 11.2.1 vs 11.1.0

Hi,

I encounter a diverge problem when using Intel MKL PARDISO to solve a transient simulation by Newton method.

This case is a unsymmetrical matrix with size is 64,000 and 1,721,082 non-zeros (L+U).

I found that it diverges when the values are changes during a sequential solving.

I tried to use CGS to "iparm[3]=101" to reduce the error, different ordering and with/without scale.

But they don't work in my case.

Is there a possible way or options to deal with it?

Thanks!

↧

Basic Code of Using MKL FFT on MIC

December 11, 2014, 4:06 am

Latest and popular articles on Intel Technologies

≫ Next: How to link Calculix with Pardiso using Intel MKL

≪ Previous: Diverge in Newton Method

Hello,

In our local cluster we have a bunch of MIC's, which are (almost) never esed. I would like to give it the try, but I have no experience iwth the MKL library or MIC. My programs are very simple and they are based on FFT:

I have an iterative process in which the smae update procedure is applied to the previous data:

// initialize data rho[lx]

for(t=0;t<tmax;t++)
{

// first step in real space is local
for(i=0;i<lx;i++)
{
nt(i)=rho[i]*rho[i]*rho[i];
}
//forward FFT rho->rhok
//forward FFT nt-> ntk

//update in k space
for(i=0;i<lx/2+1;i++)
{
newrhok(i)=filter1(i)*rhok(i)+filter2(i)*ntk(i); // complex multiplication
}
//inverse FFT newrhok->newrho

So my problem is how to ues MKL FFT's on MIC's and how minimize the transfers between cpu and MIC.

I hope this is the right forum and many thanks in advance

Cristian

↧

How to link Calculix with Pardiso using Intel MKL

December 11, 2014, 6:31 am

Latest and popular articles on Intel Technologies

≫ Next: 1D convolution of a 3D array using Intel MKL

≪ Previous: Basic Code of Using MKL FFT on MIC

Hi,

I was wondering if it is possible to link Calculix v2.7 (an open source application) with Pardiso using Intel MKL somehow. I am a complete beginner in this field so I was wondering if it is possible can someone provide the steps to do so. I have Intel Parallel Studio 2015.

PS. I have gone over the link: https://software.intel.com/en-us/forums/topic/283747. It provides some help but I am still somewhat lost.

Any help would be appreciated

Thanks,

Astryl

↧

1D convolution of a 3D array using Intel MKL

December 12, 2014, 8:52 am

Latest and popular articles on Intel Technologies

≫ Next: matrix multiplication speedup

≪ Previous: How to link Calculix with Pardiso using Intel MKL

I have a 3D array which is stored in a columnwise fashion.

for( int k = 0; k < nTop; k++ ) // Loop through the tops.
    for( int j = 0; j < nCol; j++ ) // Loop through the columns.
        for( int i = 0; i < nRow; i++ ) // Loop through the rows
        {
            ijk = i + nRow * j + nRow * nCol * k;
            my3Darray[ ijk ] = 1.0;
        }

I want to apply three different 1D kernels of size 2x1 across all the rows, all the columns, and all the tops of my 3D array separately and one after another.

To be able to use Intel MKL, I read the MKL documentation which describes creating a new convolution or correlation task descriptor for multidimensional case. I carefully read "Mathematical Notation and Definitions" that talks about the notations used for convolution. I also read the example file named vslsconv_2d_auto.c. I am lost in the implementation of 1D convolution to a 3D array.

The following code is my understanding from the documentation in a simple C code, which is the modified version of the example file vslsconv_2d_auto.c. In my code, I am trying to apply 1D convolution with kernel = [-1 1] on all the rows of the 3D array and get the convolved result which has the same size as the input. My array has a general size of nRow×nCol×nTop. In the example code below, I chose the size to be 3×4×5.

int main()
{

    VSLConvTaskPtr task;

    int nRow = 3, nCol = 4, nTop = 5;
    double *x = new double[nRow*nCol*nTop];

    int n1Ker = 2, n2Ker = 1;
    double *kernel = new double[ n1Ker*n2Ker ];

    double *xConvolved = new double[(nRow+n1Ker)*(nCol+n2Ker)*nTop];

    MKL_INT xshape[3]  = {nRow, nCol, nTop};
    MKL_INT convolved_shape[3] = {nRow, nCol, nTop};
    MKL_INT kernel_shape[2]= {2,1};

    MKL_INT rank=3;

    int status;

    for( int i = 0; i < nRow*nCol*nTop; i++ )
        x[ i ] = 1;

    kernel[ 0 ] = -1; kernel[ 1 ] =  1;

    int mode = VSL_CONV_MODE_AUTO;

    /* Create task descriptor (create descriptor of problem) */
    status = vsldConvNewTask(&task, mode, rank, xshape, kernel_shape, convolved_shape);
    if( status != VSL_STATUS_OK ){
    printf("ERROR: creation of job failed, exit with %d\n", status);
    return 1;}

    /* Execute task (Calculate 2 dimension convolution of two arrays)  */
    status = vsldConvExec(task, x, NULL, kernel, NULL, xConvolved, NULL);
    if( status != VSL_STATUS_OK ){
    printf("ERROR: job status bad, exit with %d\n", status);
    return 1;}

    /* Delete task object (delete descriptor of problem) */
    status = vslConvDeleteTask(&task);
    if( status != VSL_STATUS_OK ){
    printf("ERROR: failed to delete task object, exit with %d\n", status);
    return 1;}

    for( int i = 0; i < nRow*nCol*nTop; i++)
    printf("%f\n", xConvolved[i]);

    delete[] x;
    delete[] xConvolved;
    delete[] kernel;

    return 0;
}

After running the code, I get the following error:

ERROR: job status bad, exit with -2312

I would be thankful if my colleagues in the forum could let me know how I can fix this issue and help me find out how to correctly get 1D convolution of a 3D array on its rows, columns, or tops.

↧