Quantcast
Channel: Intel® Math Kernel Library
Viewing all 1435 articles
Browse latest View live

Extended Eigensolver Routines: Strange eigenvectors

$
0
0

Hello,

I am currently trying to solve the eigenproblem Av = λv where A is a complex hermitian matrix using the MKL feast implementation. For testing purposes I constructed the following example (using C++):

#include <mkl.h>
#include <iostream>
#include <vector>
#include <complex>

int main() {
	using namespace std;

	int fpm[128]{ };
	::feastinit(fpm);

/* A =
0	2-i	1
2+i	0	0
1	0	0
*/

	vector<complex<double>> entries = { complex<double>(2, -1), 1, complex<double>(2, +1), 1 };
	vector<int> cols = { 2, 3, 1, 1 };
	vector<int> rows = { 1, 3, 4, 5 };

	char uplo = 'F';
	double eps = 0;
	int loop = 0;
	double emin = -4;
	double emax = 4;
	int m0 = 3;
	vector<complex<double>> eigenvectors(m0 * m0);
	vector<double> eigenvalues(m0);
	vector<double> res(m0);
	int mode = 0;
	int info = 0;

	zfeast_hcsrev(&uplo, &m0, reinterpret_cast<MKL_Complex16*>(&entries[0]), &rows[0], &cols[0],
				  fpm, &eps, &loop, &emin, &emax, &m0, &eigenvalues[0], reinterpret_cast<MKL_Complex16*>(&eigenvectors[0]), &mode, &res[0], &info);

}

The eigenvalues of the matrix are 0 and ±square root 6 ≈ ±2.44948.

MKL produces this exact eigenvalues. However, the eigenvectos are strange. For example, the corresponding eigenvector for +sqrt(6) is (already normalized):

1/sqrt(2), 1/sqrt(2) + i/(2*sqrt(3)), 1/(2*sqrt(3))

or alternatively

0.707106, 0.577350 + i*0.28867, 0.28867

However, MKL produces:

0.48890728596802574+i*0.51085190195141617
0.19063671173166902+i*0.61670439499553087
0.19959556369177356+i*0.20855441565187854

What's the matter with these? What am I doing wrong?

 

Thank you.


error #6284: There is no matching specific function for this generic function reference. [DFTICOMPUTEFORWARD]

$
0
0

Hi!

   I have the questions:I want to use the FFT, but now the errors appeared, I don't know how to deal whit them .Can you tell me the reasons of the errors,and tell me how to deal with them.

Thank you !

Fast Discrete Fourier Transform with MKL

$
0
0

Hi all,

In R programming, there is the "fft" function: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/fft.html

> x <- c(102, 55, 89, 12, 3, 45, 9)> fft(x)
[1] 315.00000+ 0.00000i  98.57101-82.76603i -23.61882-18.71932i
[4] 124.54781+ 5.66758i 124.54781- 5.66758i -23.61882+18.71932i
[7]  98.57101+82.76603i
> Re(fft(x))
[1] 315.00000  98.57101 -23.61882 124.54781 124.54781 -23.61882  98.57101

What I have to is to replicate the Re(fft(x)) output in Fortran. I wonder if Intel MKL could do the job.

! testfft.f90

program myfft

 implicit none
 integer, dimension(7) :: x
 double precision, dimension(7) :: Refftx

 x = (/ 102, 55, 89, 12, 3, 45, 9 /)

 ! ============================================================ !
 ! Here, I need to use a fft function to return the result in R !
 ! ============================================================ !
 
 ! For example,
 
 Refftx = theFFTfunctionIdonotKnow(x)

 print*, Refftx

end program myfft

How do I have to build/compile the "testfft.f90" file?

The machine I use is equipped with Intel Parallel Studio XE 2013. I generally compile .f90 files by typing ifort myfile.f90 on the Intel 64 Visual Studio 2008 mode command prompt. The machine runs Window 7 64 bit.

Thank you very much!

mkldcsrbsr gives core dump when trying to fill all arrays

$
0
0

Hi

 

I am using the mkldcsrbsr routine. I ma using MKL version 10.3 (i have tried with version 11.0 as well). It seems i can call the routine perfectly for the job type = -1. I get the number of blocks correctly but when i call the routine again to fill all the arrays i get a coredump.

 

Here is the code i am using

rowsAbsr = new int[N_A+1];
  job[0]=0;//CSR to BSR
  job[1]=0;// zero based CSR
  job[2]=0;//zero based bsr
  job[3]=0;
  job[4]=0;
  job[5]=-1;//
  mkl_dcsrbsr(job, &N_A, &m, &ldabsr, nnzsAcsr, colsAcsr, rowsAcsr, nnzsAbsr, colsAbsr, rowsAbsr, &info);
  cout<<"The conversion to BSR for matrix A is "<<info<<"."<<endl;
  cout<<"The request for number of blocks of BSR for matrix A is "<<rowsAbsr[0]<<"."<<endl;
  sizecolsabsr=rowsAbsr[0];
  colsAbsr = new int[rowsAbsr[0]];
  nnzsAbsr = new double[m*m*(rowsAbsr[0])];
  ldabsr=m*m*(rowsAbsr[0]);
    cout<<"ldabsr is calculated as "<<ldabsr<<"."<<endl;
    job[0]=0;//CSR to BSR
    job[1]=0;// zero based CSR
    job[2]=0;//zero based bsr
    job[5]=0;// only row and column arrays for bsr are filled
    mkl_dcsrbsr(job, &N_A, &m, &ldabsr, nnzsAcsr, colsAcsr, rowsAcsr, nnzsAbsr, colsAbsr, rowsAbsr, &info);
    cout<<"The conversion to BSR for matrix A is "<<info<<"."<<endl;
    job[0]=0;//CSR to BSR
    job[1]=0;// zero based CSR
    job[2]=0;//zero based bsr
    job[5]=1;// all arays are filled
    mkl_dcsrbsr(job, &N_A, &m, &ldabsr, nnzsAcsr, colsAcsr, rowsAcsr, nnzsAbsr, colsAbsr, rowsAbsr, &info);
    cout<<"The conversion to BSR for matrix A is "<<info<<"."<<endl;

I had my doubts on ldabsr but as per the documentation available online it seems i am doing the right thing. Could you be so kind to suggest if you spot any obvious error here?

 

with kind regards

rohit

 

Incosistent in-place and out-of-place results in mkl_dfti

$
0
0

Hi,

I am trying to transfer from out-of-place calculation to in-place using MKL 10.3.12. I don't see any problem when doing forward FFT. However in backward FFT, for dimensions larger than 4, I get inconsistent results for in-place and out-of-place calculations. This happens when I use backward scaling of 1.0 (which I need in my problem), and the issue is resolved when scaling of 1/(K1*K2*K3) is used instead!. I have attached a minimal code for reproducing the results. I compiled it with:

gfortran -fcray-pointer -I$myMKLINC main.f90 -L$MKLROOT/lib/intel64/ -L/opt/intel/composer_xe_2011_sp1.12.361/compiler/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -liomp5 -o amain

Thanks for any help

Amir

AttachmentSize
Downloadmain.f908.16 KB

How to configure properly for cluster PARDISO in WinServer 2012?

$
0
0

We are running an application with (currently) the non-cluster PARDISO under Windows Server 2012 SP1, the machine is a HP DL980 with 160 logical cores (80 cores with HT turned on) and 2TB RAMs. We are having problems in utilizing all cores, most likely originated from the Windows "Core Group" limit of 64 cores each.  I am happy to see now Intel offers the clustered version of PERDISO in MKL.  Can this new version now handle multiple core groups as clusters of computers? Any special configuration requirement to use the new PARDISO? Or we have to reconfigure the machines with the clustering SW?  Do I need to update my Fortran Composer license to the Clustered edition, or an update of current professional edition will do? 

 

-mkl gives different results from -lfftw3

$
0
0

I am developing a time-stepping code that calls fft routines in every step. While writing and testing the code, I used the -lfftw3 flag to link to the fftw3 library. Now that the code is functional, I tried to link to the MKL version of this library instead, as I think it may be faster. However, the result is completely different. With the -lfftw3 flag, the output seems to make sense, but not with the -mkl option. I am hoping that somebody can explain the difference. This is of great importance to me, as I often use fftw3, lapack and similar libraries, and it seems that MKL gives the best performance.

Operating system: ubuntu 13.10, but it happens on our cluster, too.
Hardware: Lenovo laptop with Intel(R) Core(TM) i7-4600U CPU, but it happens on our cluster, too.
Ifort version: Version 12.1.0.233 Build 20110811 (called through mpif90 with OMPI_FC=ifort)

> mpif90  -o test.x LES_cont.f90  -llapack -lfftw3
>mpirun -np 1 ./test.x
( ... computation ...)
 Residual=   3.237650346704463E-012

>mpif90  -mkl -o test.x LES_cont.f90
>mpirun -np 1 ./test.x
(... computation...)
 Residual=   2.33661403394626

Between the two runs i change only the complier/linker options as shown, nothing else. Thanks in advance for your help!

about parallelism on BLAS level-1 routines and VML

$
0
0

Hi all,

I am running BLAS routines in MKL with intel compiler (icpc). Following the example given in the compiler, I try to set the numbers of threads from 1 to 10 while running dgemm routine for matrix-matrix multiplication and I saw the speedup while increasing the number of threads. However, for level-1 routines (e.g. cblas_zcopy, cblas_zaxpby), I didn't see any speed up for multithreading version. I wonder if there is any multi-threading version for level-1 routines or not? What about the VML routines? I also try to use those routines (e.g. vzExp, vzMul) but no speedup at all in multithreading environment.


occured linking errors while using CLUSTER_SPARSE_SOLVER

$
0
0

Hello Everyone,

           While using 'CLUSTER_SPARSE_SOLVER' for solving sparse matrix, I got linking errors. I have included "mkl_cluster_sparse_solver.h" as well as "mpi.h" files also. What should I do next?

Thanks in advance.

Mayur

BUILD LOG :-

1>------ Build started: Project: MKLWrapper, Configuration: Release x64 ------
1>Build started 12/5/2014 3:53:24 PM.
1>InitializeBuildStatus:
1>  Touching "..\..\Obj\x64\Release\MKLWrapper\MKLWrapper.unsuccessfulbuild".
1>ClCompile:
1>  All outputs are up-to-date.
1>  All outputs are up-to-date.
1>  MKLWrapper.cpp
1>Link:
1>     Creating library ..\..\Obj\x64\Release\MKLWrapper\..\MKLWrapper.lib and object ..\..\Obj\x64\Release\MKLWrapper\..\MKLWrapper.exp
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Barrier
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Comm_rank
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Comm_size
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Irecv
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Recv
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Isend
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Send
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Test
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Bcast
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Comm_split
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Scatterv
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Gatherv
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Allgather
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Reduce
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Alltoall
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Alltoallv
1>mkl_core.lib(cpardiso_blacs.obj) : error LNK2001: unresolved external symbol MKL_Comm_free

[REQUEST] Looking for old MKL Versions

$
0
0

Hi,

I'm searching for some older MKL versions, but am having trouble locating them.  Does anyone have (or know where I can find) some/any of these?  Thanks in advance

l_mkl_p_8.0.019.tgz
l_mkl_p_8.0.1.006.tgz
l_mkl_p_8.1.014.tgz
l_mkl_p_8.1.1.004.tgz
l_mkl_p_9.0.018.tgz
l_mkl_p_9.1.023.tgz

FFT-Based 3D Convolution With Zero Padding

$
0
0

I have been trying to figure out how I can use Intel MKL to perform a FFT-based 3D convolution with zero-padding.  I have been searching and posting in online forums (including Intel MKL forum), unfortunately, I have not been very successful so far. 

I have a 3D array and is stored as a 1D array of type double in a columnwise fashion. Similarly the kernel is of type double and is saved columnwise. For example,

for( int k = 0; k < nk; k++ ) // Loop through the height.
    for( int j = 0; j < nj; j++ ) // Loop through the rows.
        for( int i = 0; i < ni; i++ ) // Loop through the columns.
        {
            ijk = i + ni * j + ni * nj * k;
            my3Darray[ ijk ] = 1.0;
        }

I am writing a 3D convolution function:

  • which takes in real values (not complex values) and
  • outputs the results of the convolution,
  • for the computation of convolution, I am performing a "not-in-place" FFT on the input array as well as the kernel in order to prevent them from getting modified (I need to use them later in my code) and
  • then do the backward computation "in-place".

During the process I am also considering the zero padding to avoid any artifacts.  The size of FFTs are (dim_input+dim_kernel-1) on each dimension and the next highest power of two is chosen for speed.

My questions are:

  1. How can I perform the zero-padding?
  2. How should I deal with the size of the arrays used by FFT functions?
  3. How can I take out the zero padded results and get the actual result?

I would be  absolutely grateful to have any comments or suggestion.

#include "mkl.h"

int max(int a, int b, int c);
void Conv3D_R2C(
    double *in, int nRowsIn , int nColsIn , int nHeightsIn ,
    double *ker, int nRowsKer, int nColsKer, int nHeightsKer,
    double *out );

int main()
{

    int n = 5;
    int nkernel = 3;

    double *a          = new double [n*n*n]; // This array is real.
    double *aconvolved = new double [n*n*n]; // The convolved array is also real.
    double *kernel     = new double [nkernel*nkernel*nkernel]; // kernel is real.

    // Fill the array with some 'real' numbers.
    for( int i = 0; i < n*n*n; i++ )
        a[ i ] = 1.0;

    // Fill the kernel with some 'real' numbers.
    for( int i = 0; i < nkernel*nkernel*nkernel; i++ )
        kernel[ i ] = 1.0;

    // Calculate the convolution.
    Conv3D_R2C( a, n, n, n, kernel, nkernel, nkernel, nkernel, aconvolved );

    delete[] a;
    delete[] kernel;
    delete[] aconvolved;
}

void Conv3D_R2C( // Real to Complex 3D FFT.
    double *in, int nRowsIn , int nColsIn , int nHeightsIn ,
    double *ker, int nRowsKer, int nColsKer, int nHeightsKer,
    double *out )
{

    int nIn  = max( nRowsIn , nColsIn , nHeightsIn  );
    int nKer = max( nRowsKer, nColsKer, nHeightsKer );
    int n = nIn + nKer - 1;

    /* Strides describe data layout in real and conjugate-even domain. */
    MKL_LONG rs[4], cs[4];

    // DFTI descriptor.
    DFTI_DESCRIPTOR_HANDLE fft_desc = 0;

    // Round up to the next highest power of 2.
    unsigned int N = (unsigned int) n; // compute the next highest power of 2 of 32-bit n.
    N--;
    N |= N >> 1;
    N |= N >> 2;
    N |= N >> 4;
    N |= N >> 8;
    N |= N >> 16;
    N++;

    // Variables needed for out-of-place computations.
    MKL_Complex16 *in_fft  = new MKL_Complex16 [ N*N*N ];
    MKL_Complex16 *ker_fft = new MKL_Complex16 [ N*N*N ];
    MKL_Complex16 *out_fft = new MKL_Complex16 [ N*N*N ];
    double *out2 = new double [ N*N*N ];

    /* Compute strides */
    rs[3] = 1;           cs[3] = 1;
    rs[2] = (N/2+1)*2;   cs[2] = (N/2+1);
    rs[1] = N*(N/2+1)*2; cs[1] = N*(N/2+1);
    rs[0] = 0;           cs[0] = 0;

    // Create DFTI descriptor.
    MKL_LONG sizes[] = { N, N, N };
    DftiCreateDescriptor( &fft_desc, DFTI_DOUBLE, DFTI_REAL, 3, sizes );

    // Configure DFTI descriptor.
    DftiSetValue        ( fft_desc, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX );
    DftiSetValue        ( fft_desc, DFTI_PLACEMENT             , DFTI_NOT_INPLACE     ); // Out-of-place transformation.
    DftiSetValue        ( fft_desc, DFTI_INPUT_STRIDES  , rs  );
    DftiSetValue        ( fft_desc, DFTI_OUTPUT_STRIDES , cs  );
    DftiCommitDescriptor( fft_desc );
    DftiComputeForward  ( fft_desc, in , in_fft  );
    DftiComputeForward  ( fft_desc, ker, ker_fft );

    for(long long i = 0; i < (long long)N*N*N; i++ )
    {
        out_fft[i].real = in_fft[i].real * ker_fft[i].real;
        out_fft[i].imag = in_fft[i].imag * ker_fft[i].imag;
    }

    // Change strides to compute backward transform.
    DftiSetValue        ( fft_desc, DFTI_INPUT_STRIDES , cs);
    DftiSetValue        ( fft_desc, DFTI_OUTPUT_STRIDES, rs);
    DftiCommitDescriptor( fft_desc );
    DftiComputeBackward ( fft_desc, out_fft, out2 );

    // Printing the zero padded 3D convolved result.
    for( long long i = 0; i < (long long)N*N*N; i++ )
        printf( out2, N*N*N );

    /* I don't know how to take out the zero padded results and
       save the actual result in the variable named "out" */

    DftiFreeDescriptor  ( &fft_desc );

    delete[] in_fft;
    delete[] ker_fft;
    delete[] out2;
}

int max(int a, int b, int c)
{
     int m = a;
     (m < b) && (m = b); //these are not conditional statements.
     (m < c) && (m = c); //these are just boolean expressions.
     return m;
}

 

trsm memory leak

$
0
0

Hello,

When runing valgrind on "source/cblas_dtrsmx.out" compiled from within "mkl/examples/cblas", I end up with a log similar to:

==9740== 3,131,264 bytes in 1 blocks are still reachable in loss record 7 of 7
==9740==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==9740==    by 0x60EAFB4: mkl_serv_allocate (in /opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_core.so)
==9740==    by 0x9269DBE: mkl_blas_mc3_dgemm_get_bufs (in /opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_mc3.so)
==9740==    by 0x9242B7A: mkl_blas_mc3_xdtrsm (in /opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_mc3.so)
==9740==    by 0x4F466C2: DTRSM (in /opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_lp64.so)
==9740==    by 0x4F60E0C: cblas_dtrsm (in /opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_lp64.so)
==9740==    by 0x4016D5: main (in /opt/intel/composer_xe_2015.1.133/mkl/examples/cblas/_results/intel_lp64_sequential_intel64_so/cblas_dtrsmx.out)

Is that normal to have that much memory leaked ?

Thank you in advance.

 

[Scalapack] Please Help with using pdgesv

$
0
0

Hello all:

 I'm trying to solve a linear system (9 by 9 full matrix) by pdgesv in c. I use the example code (http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=1683&sid=26b4f253...) and compiling is ok. However, there is a error info after calling pdgesv: 

“On entry to PDGESV parameter number 602 had an illegal value”.

According to the PDGESV source code, it means the 2th element of the 6th argument was wrong, which means the descA[1] is wrong?

However, descA[1] (ictxt) is rewrited by Cblacs_gridinit(&ictxt, "Row", nprow, npcol). 

after printf ictxt, I find all ictxt = 0. IS this reason lead to the error message?

I really need to solve a large dense(full) matrix, can someone help me or give me some advice?

The following is my C code:

PS: I've called MPI_Init(&argc,&argv) before entering Scalapack(int argc, char ** argv)  and called MPI_Finalize() after leaving Scalapack(int argc, char ** argv).

 

#include <mpi.h>
#include <iostream>
#include <vector>
#include <cmath>
#include <cstdlib>
#include "Scalapack.h"
#include <mkl.h>
#include <mkl_scalapack.h>
#include "mkl_lapacke.h"
#include <mkl_cblas.h>

#define mat(matriz,coluna,i,j) (matriz[i*coluna+j])

#define p_of_i(i,bs,p) ( MKL_INT((i-1)/bs)%p)
#define l_of_i(i,bs,p) ( MKL_INT((i-1)/(p*bs)))
#define x_of_i(i,bs,p) (((i-1)%bs)+1)

#define   numroc_      NUMROC

using namespace std;

extern "C" 
{
    /* BLACS C interface */
    void Cblacs_pinfo(int* mypnum, int* nprocs);
    void Cblacs_get( MKL_INT context, MKL_INT request, MKL_INT* value);
    int  Cblacs_gridinit( MKL_INT* context, char * order, MKL_INT np_row, MKL_INT np_col);
    void Cblacs_gridinfo( MKL_INT context, MKL_INT*  np_row, MKL_INT* np_col, MKL_INT*  my_row,
    MKL_INT*  my_col);
    int  numroc_( MKL_INT *n, MKL_INT *nb, MKL_INT *iproc, MKL_INT *isrcproc, MKL_INT *nprocs);
    void Cblacs_gridexit(MKL_INT ictxt);
    void Cblacs_barrier(MKL_INT ictxt, char * order);
}

void find_nps(MKL_INT np, MKL_INT &nprow, MKL_INT & npcol);
int getIndex(MKL_INT row, MKL_INT col,MKL_INT NCOLS) {return row*NCOLS+col;}

CTEST_Scalapack::CTEST_Scalapack(void)
{
}

CTEST_Scalapack::~CTEST_Scalapack(void)
{
}

int CTEST_Scalapack::Scalapack(int argc, char ** argv) 
{

    int nprocs = 0;//MPI::COMM_WORLD.Get_size();
    int rank = 0;//MPI::COMM_WORLD.Get_rank();

    MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
    MPI_Comm_rank(MPI_COMM_WORLD,&rank);

    std::cout<<"Returned: "<<"";
    std::cout << "Hello World! I am "<< rank << " of "<< nprocs <<
    std::endl;

    srand(1);
    MKL_INT myrow=0;
    MKL_INT mycol=0;
    MKL_INT ictxt=0;
    MKL_INT nprow=0,npcol=0;

    MKL_INT BLOCK_SIZE =2; //this gonna be tricky - should be 64, but cannot be larger than the original matrix

    MKL_INT locR=0, locC=0;
    MKL_INT block = BLOCK_SIZE;
    MKL_INT izero = 0;
    MKL_INT matrix_size = 9;
   
    MKL_INT myone = 1;
    
    MKL_INT nrhs = 1;
   
    MKL_INT info=0;
  
    int i=0,j=0;
    double mone=(-1.e0),pone=(1.e0);
    double AnormF=0.e0, XnormF=0.e0, RnormF=0.e0, BnormF=0.e0, residF=0.e0,eps=0.e0;

    find_nps(nprocs,nprow,npcol);

    Cblacs_pinfo( &rank, &nprocs ) ;
    Cblacs_get(-1, 0, &ictxt);
    Cblacs_gridinit(&ictxt, "Row", nprow, npcol);
    Cblacs_gridinfo(ictxt, &nprow, &npcol, &myrow, &mycol);
    
    locR = numroc_(&matrix_size, &block, &myrow, &izero, &nprow);
    locC = numroc_(&matrix_size, &block, &mycol, &izero, &npcol);

   
    ////GLOBAL
    double * A = new double[matrix_size*matrix_size]();
    double * B = new double[matrix_size]();
    double * Acpy = new double[matrix_size*matrix_size]();
    double * Bcpy = new double[matrix_size]();
    
    //LOCAL
    double * local_know_vector = new double[locR]();
    double * local_matrix = new double[locR*locC]();
    
    MKL_INT* ipiv = new MKL_INT [locC*locR*block+1000000]();

    
    B[2] = 1;
    B[3] = 0;
    B[4] = 0;
    B[5] = 0;
    
    
    
    A[0] = 19;
    A[1] = 3;
    A[2] = 1;
    A[3] = 12;
    A[4] = 1;
    A[5] = 16;
    A[6] = 1;
    A[7] = 3;
    A[8] = 11;
    
    A[9] = -19;
    A[10] = 3;
    A[11] = 1;
    A[12] = 12;
    A[13] = 1;
    A[14] = 16;
    A[15] = 1;
    A[16] = 3;
    A[17] = 11;
    
    A[18] = -19;
    A[19] = -3;
    A[20] = 1;
    A[21] = 12;
    A[22] = 1;
    A[23] = 16;
    A[24] = 1;
    A[25] = 3;
    A[26] = 11;
    
    A[27] = -19;
    A[28] = -3;
    A[29] = -1;
    A[30] = 12;
    A[31] = 1;
    A[32] = 16;
    A[33] = 1;
    A[34] = 3;
    A[35] = 11;
    
    A[36] = -19;
    A[37] = -3;
    A[38] = -1;
    A[39] = -12;
    A[40] = 1;
    A[41] = 16;
    A[42] = 1;
    A[43] = 3;
    A[44] = 11;
    
    A[45] = -19;
    A[46] = -3;
    A[47] = -1;
    A[48] = -12;
    A[49] = -1;
    A[50] = 16;
    A[51] = 1;
    A[52] = 3;
    A[53] = 11;
    
    A[54] = -19;
    A[55] = -3;
    A[56] = -1;
    A[57] = -12;
    A[58] = -1;
    A[59] = -16;
    A[60] = 1;
    A[61] = 3;
    A[62] = 11;
    
    A[63] = -19;
    A[64] = -3;
    A[65] = -1;
    A[66] = -12;
    A[67] = -1;
    A[68] = -16;
    A[69] = -1;
    A[70] = 3;
    A[71] = 11;
    
    A[72] = -19;
    A[73] = -3;
    A[74] = -1;
    A[75] = -12;
    A[76] = -1;
    A[77] = -16;
    A[78] = -1;
    A[79] = -3;
    A[80] = 11;

    MKL_INT* descA  = new MKL_INT[9]();
    MKL_INT* descB  = new MKL_INT[9]();
   
    descA[0] = 1; // descriptor type
    descA[1] = ictxt; // blacs context
    descA[2] = matrix_size; // global number of rows
    descA[3] = matrix_size; // global number of columns
    descA[4] = block; // row block size
    descA[5] = block; // column block size (DEFINED EQUAL THAN ROW BLOCK SIZE)
    descA[6] = 0; // initial process row(DEFINED 0)
    descA[7] = 0; // initial process column (DEFINED 0)
    descA[8] = locR; // leading dimension of local array

    descB[0] = 1; // descriptor type
    descB[1] = ictxt; // blacs context
    descB[2] = matrix_size; // global number of rows
    descB[3] = 1; // global number of columns
    descB[4] = block; // row block size
    descB[5] = block; // column block size (DEFINED EQUAL THAN ROW BLOCK SIZE)
    descB[6] = 0; // initial process row(DEFINED 0)
    descB[7] = 0; // initial process column (DEFINED 0)
    descB[8] = locR; // leading dimension of local array

    int il=0, jl=0;
    for(i=1; i< matrix_size+1; i++) 
    {
       for(j=1; j< matrix_size+1; j++) 
       {
    
        int pi = p_of_i(i,block,nprow);
        
        int li = l_of_i(i,block,nprow);

        int xi = x_of_i(i,block,nprow);
        //printf("i = %d, j = %d, pi = %d, li = %d\n",i,j,pi,li);;fflush(stdout);
        int pj = p_of_i(j,block,npcol);
        
        int lj = l_of_i(j,block,npcol);
        
        int xj = x_of_i(j,block,npcol);
        //printf("i = %d, j = %d, pj = %d, lj = %d, xj = %d\n",i,j,pj,lj,xj);;fflush(stdout);

        if( (pi == myrow) && (pj == mycol)) 
        {
            il = li*block+xi;
            jl = lj*block+xj;
            local_matrix[getIndex(il-1, jl-1, locC)] = A[getIndex(i-1,j-1,matrix_size)];
        }
    
        if(  (pi == myrow) &&(mycol==0)  )
        {
            local_know_vector[il-1] = B[i-1];
        }

       }
    
    }
      
    ////STARTING PDGESV
    pdgesv_(&matrix_size, &nrhs, local_matrix, &myone, &myone, descA, ipiv, local_know_vector, &myone, &myone, descB, &info);
    
    if(rank==0)
      {
        if(info != 0) cout <<"PDGESV problem! Info "<<info<<endl;
      }
    
    
    for(i=0; i< locR; i++)
    {
      cout<<"**\n"<<"rank "<<rank<<" answer: "<<local_know_vector[i]<<endl;
    }

    if(NULL!=descA)                        {delete [] descA; descA=NULL;} 
    if(NULL!=descB)                        {delete [] descB; descB=NULL;} 
    if(NULL!=local_know_vector)            {delete [] local_know_vector; local_know_vector=NULL;} 
    if(NULL!=local_matrix)                {delete [] local_matrix; local_matrix=NULL;} 
    if(NULL!=Acpy)                        {delete [] Acpy; Acpy=NULL;} 
    if(NULL!=Bcpy)                        {delete [] Bcpy; Bcpy=NULL;} 
    if(NULL!=A)                            {delete [] A; A=NULL;} 
    if(NULL!=B)                            {delete [] B; B=NULL;} 
    

    Cblacs_gridexit(ictxt);

    return 0;

}

void find_nps(MKL_INT np, MKL_INT &nprow, MKL_INT & npcol) 
{

MKL_INT min_nprow=100000;
MKL_INT min_npcol=100000;

nprow = np;
npcol = np;

while(1) {

   npcol--;
  if(np%2==0   ) {
  if(npcol ==1){
   nprow --;
   npcol = nprow;
  }
  }else {
  if(npcol ==0){
   nprow --;
   npcol = nprow;
  }

  }

  if(nprow*npcol == np) {
    min_npcol = npcol;
    if(nprow < min_nprow)    min_nprow = nprow;
  }

    if(nprow ==1 ) break;

}

nprow = min_nprow;
npcol = min_npcol;

}

complex system GETRF+GETRS

$
0
0

Dear all,

I would like to solve a linear complex system with MKL libraries. As I have done with real system I use GETRF with GETRS. The MKL reference says that I can also use getrs also for comple system. Here my example code:

program testmkl
use LAPACK95
implicit none
complex    ,allocatable,dimension(:,:)::AA
complex    ,allocatable,dimension(:)  ::BB
integer    ,allocatable,dimension(:)::IPV
integer :: info,n

n=10
allocate(AA(n,n))
allocate(BB(n))
allocate(IPV(n))

 call GETRF(AA,IPV,info)
 call GETRS(AA,IPV,BB,info)

endprogram

However, I am not able to compile it. This is my error:

There is no matching specific subroutine for this generic subroutine call.   [GETRS]
 call GETRS(AA,IPV,BB,info)

Where that am i wrong? 

Thanks

Sparse-Sparse Matrix Multiplication

$
0
0

Hi

I have used mkl_dcsrmultcsr in my research. However, it is performing double pass to compute sparse*sparse matrix product. For small size problems, this is not a problem, however for large size problems (e.g. matrices of size ½ billion by ½ billion) this is time consuming and it would be better if MKL can do this multiplication in a single pass in a parallel setup.

Most (if not all) of the sparse*sparse CSR matrix multiplications algorithm use Gustavson Algorithm (ACM 1978) and there is no reason why this algorithm cannot be parallelized and do calculations in a single pass. I understand that the performance of a single pass parallelization would depend on pre-allocating the space non-zero values, which I think can be reasonably given in most situations and even if this does not work the algorithm should be able to adjust the buffer size (if required).  

Similarly, it would be useful to only compute lower/upper triangular portion of the output matrix (of course the output matrix have to be symmetric).

Application domain: Statistics, PDE’s, Inverse Problems, Weather Prediction.

Thanks

Vineet 


Different results using 11.2.1 vs 11.1.0

$
0
0

I have the identical SPD matrix and rhs but got the different solution using mkl 11.2.1 on win vs 11.1.0 on linux.

on win

Major version: 11
Minor version: 2
Update version: 1
Product status:  Product
Build: n20141023
Processor optimization: Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors

on linux

Major version: 11
Minor version: 1
Update version: 0
Product status:  Product
Build: n20130711
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) Enabled Processor

A related question: we want to use intel 15.0. It seems we must use mkl11.2. Is that true?

Diverge in Newton Method

$
0
0

Hi,

I encounter a diverge problem when using Intel MKL PARDISO to solve a transient simulation by Newton method.

This case is a unsymmetrical matrix with  size is 64,000 and 1,721,082 non-zeros (L+U).

I found that it diverges when the values are changes during a sequential solving.

I tried to use CGS to "iparm[3]=101" to reduce the error, different ordering and with/without scale.

But they don't work in my case.

Is there a possible way or options to deal with it?

Thanks!

 

Basic Code of Using MKL FFT on MIC

$
0
0
 

Hello,

In our local cluster we have a bunch of MIC's, which are (almost) never esed. I would like to give it the try, but I have no experience iwth the MKL library or MIC. My programs are very simple and they are based on FFT:

I have an iterative process in which the smae update procedure is applied to the previous data:

// initialize data rho[lx]

for(t=0;t<tmax;t++)
{

// first step in real space is local
for(i=0;i<lx;i++)
{
nt(i)=rho[i]*rho[i]*rho[i];
}
//forward FFT rho->rhok
//forward FFT nt-> ntk

//update in k space
for(i=0;i<lx/2+1;i++)
{
newrhok(i)=filter1(i)*rhok(i)+filter2(i)*ntk(i); // complex multiplication
}
//inverse FFT newrhok->newrho

So my problem is how to ues MKL FFT's on MIC's and how minimize the transfers between cpu and MIC.

I hope this is the right forum and many thanks in advance

Cristian

 

How to link Calculix with Pardiso using Intel MKL

$
0
0

Hi,

I was wondering if it is possible to link Calculix v2.7 (an open source application) with Pardiso using Intel MKL somehow. I am a complete beginner in this field so I was wondering if it is possible can someone provide the steps to do so. I have Intel Parallel Studio 2015. 

PS. I have gone over the link: https://software.intel.com/en-us/forums/topic/283747. It provides some help but I am still somewhat lost.

Any help would be appreciated

Thanks,

Astryl

1D convolution of a 3D array using Intel MKL

$
0
0

I have a 3D array which is stored in a columnwise fashion.

for( int k = 0; k < nTop; k++ ) // Loop through the tops.
    for( int j = 0; j < nCol; j++ ) // Loop through the columns.
        for( int i = 0; i < nRow; i++ ) // Loop through the rows
        {
            ijk = i + nRow * j + nRow * nCol * k;
            my3Darray[ ijk ] = 1.0;
        }

I want to apply three different 1D kernels of size 2x1 across all the rows, all the columns, and all the tops of my 3D array separately and one after another. 

To be able to use Intel MKL, I read the MKL documentation which describes creating a new convolution or correlation task descriptor for multidimensional case. I carefully read "Mathematical Notation and Definitions" that talks about the notations used for convolution. I also read the example file named vslsconv_2d_auto.c. I am lost in the implementation of 1D convolution to a 3D array. 

The following code is my understanding from the documentation in a simple C code, which is the modified version of the example file vslsconv_2d_auto.c. In my code, I am trying to apply 1D convolution with kernel = [-1 1] on all the rows of the 3D array and get the convolved result which has the same size as the input. My array has a general size of nRow×nCol×nTop. In the example code below, I chose the size to be 3×4×5.

int main()
{

    VSLConvTaskPtr task;

    int nRow = 3, nCol = 4, nTop = 5;
    double *x = new double[nRow*nCol*nTop];

    int n1Ker = 2, n2Ker = 1;
    double *kernel = new double[ n1Ker*n2Ker ];

    double *xConvolved = new double[(nRow+n1Ker)*(nCol+n2Ker)*nTop];

    MKL_INT xshape[3]  = {nRow, nCol, nTop};
    MKL_INT convolved_shape[3] = {nRow, nCol, nTop};
    MKL_INT kernel_shape[2]= {2,1};

    MKL_INT rank=3;

    int status;

    for( int i = 0; i < nRow*nCol*nTop; i++ )
        x[ i ] = 1;

    kernel[ 0 ] = -1; kernel[ 1 ] =  1;

    int mode = VSL_CONV_MODE_AUTO;

    /* Create task descriptor (create descriptor of problem) */
    status = vsldConvNewTask(&task, mode, rank, xshape, kernel_shape, convolved_shape);
    if( status != VSL_STATUS_OK ){
    printf("ERROR: creation of job failed, exit with %d\n", status);
    return 1;}

    /* Execute task (Calculate 2 dimension convolution of two arrays)  */
    status = vsldConvExec(task, x, NULL, kernel, NULL, xConvolved, NULL);
    if( status != VSL_STATUS_OK ){
    printf("ERROR: job status bad, exit with %d\n", status);
    return 1;}

    /* Delete task object (delete descriptor of problem) */
    status = vslConvDeleteTask(&task);
    if( status != VSL_STATUS_OK ){
    printf("ERROR: failed to delete task object, exit with %d\n", status);
    return 1;}

    for( int i = 0; i < nRow*nCol*nTop; i++)
    printf("%f\n", xConvolved[i]);

    delete[] x;
    delete[] xConvolved;
    delete[] kernel;

    return 0;
}

After running the code, I get the following error:

ERROR: job status bad, exit with -2312

I would be thankful if my colleagues in the forum could let me know how I can fix this issue and help me find out how to correctly get 1D convolution of a 3D array on its rows, columns, or tops.

Viewing all 1435 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>