Quantcast
Channel: Intel® Math Kernel Library
Viewing all 1435 articles
Browse latest View live

インテル® マス・カーネル・ライブラリー (インテル® MKL) クックブック

$
0
0

資料番号: 330244-002JA

インテル® マス・カーネル・ライブラリー (インテル® MKL) には、行列を乗算する、連立方程式を解く、フーリエ変換を行うなど、さまざまな数値問題を解く際に役立つ多くのルーチンが含まれています。専用のインテル® MKL ルーチンが用意されていない問題については、インテル® MKL で提供されているビルディング・ブロックを組み合わせることにより問題を解くことができます。

インテル® MKL クックブックには、より複雑な問題を解くためにインテル® MKL ルーチンを組み合わせる際に役立つ手法が含まれています。

クックブックのコードサンプルは、Fortran または C で提供されています。

English

LU 因数分解されたブロック三重対角係数行列を含む連立線形方程式を解く

$
0
0

インテル® マス・カーネル・ライブラリー・クックブック

目的

インテル® MKL LAPACK のルーチンを使用してブロック三重対角係数行列を含む連立方程式の解を求める (LAPACK にはブロック三重対角係数行列を含む式を直接解くルーチンがないため)。

ソリューション

インテル® MKL LAPACK は、LU 因数分解された係数行列を含む連立方程式を解くためのさまざまなサブルーチンを提供しています (密行列、帯行列、三重対角行列など)。この手法は、すべてのブロックが正方形で同位という条件を前提として、このセットをブロック三重対角行列に拡張します。ブロック三重対角行列 Aの形式は次のとおりです。

LU 因数分解された行列 A=LUと複数の右辺 (RHS) を含む式 AX=Fは、2 段階で解きます (LU 因数分解の詳細は、「一般的なブロック三重対角行列の因数分解」を参照)。

  1. 前方置換。ピボットを含む連立方程式 LY=F (Lは下三角係数行列三角) を解きます。 因数分解されたブロック三重対角行列では、最後のブロックを除く Yのブロックはすべて、次の方法によりループ内で見つかります。

    1. 右辺にピボット置換を適用します。

    2. 下三角係数行列の NB線形方程式を解きます (NBはブロックの次数)。

    3. 次のステップのために右辺を更新します。

    最後のピボットの構造 (2 つのブロック置換を連続して適用する必要がある) と係数行列の構造により、最後の 2 つのブロック・コンポーネントはループの外で見つかります。

  2. 後方置換。式 UX=Yを解きます。 ピボットを含まないため、このステップはより単純です。プロシージャーは、最初のステップに似ています。

    1. 三角係数行列を含む方程式を解きます。

    2. 右辺のブロックを更新します。

    前のステップとの違いは、係数行列が下三角ではなく上三角で、ループの向きが逆になっていることです。

ソースコード: サンプル (http://software.intel.com/en-us/mkl_cookbook_samples) の BlockTDS_GE/source/dgeblttrs.fファイルを参照してください。

前方置換

! 前方置換
! ループ内で配列 F に格納されているコンポーネント Y_K を計算
      DO K = 1, N-2
          DO I = 1, NB
              IF (IPIV(I,K) .NE. I) THEN
                  CALL DSWAP(NRHS, F((K-1)*NB+I,1), LDF, F((K-1)*NB+IPIV(I,K),1), LDF)
              END IF
          END DO
          CALL DTRSM('L', 'L', 'N', 'U', NB, NRHS, 1D0, D(1,(K-1)*NB+1), NB, F((K-1)*NB+1,1), LDF)
          CALL DGEMM('N', 'N', NB, NRHS, NB, -1D0, DL(1,(K-1)*NB+1), NB, F((K-1)*NB+1,1), LDF, 1D0,
     +        F(K*NB+1,1), LDF)
      END DO

! 最後の 2 つのピボットを適用
      DO I = 1, NB
           IF (IPIV(I,N-1) .NE. I) THEN
               CALL DSWAP(NRHS, F((N-2)*NB+I,1), LDF, F((N-2)*NB+IPIV(I,N-1),1), LDF)
           END IF
      END DO

      DO I = 1, NB
           IF(IPIV(I,N)-NB.NE.I)THEN
               CALL DSWAP(NRHS, F((N-1)*NB+I,1), LDF, F((N-2)*NB+IPIV(I,N),1), LDF)
           END IF
      END DO
! ループの外で Y_N-1 と Y_N を計算して配列 F に格納
      CALL DTRSM('L', 'L', 'N', 'U', NB, NRHS, 1D0, D(1,(N-2)*NB+1), NB, F((N-2)*NB+1,1), LDF)
      CALL DGEMM('N', 'N', NB, NRHS, NB, -1D0, DL(1,(N-2)*NB +1), NB, F((N-2)*NB+1,1), LDF, 1D0,
     +     F((N-1)*NB+1,1), LDF)

後方置換

…
! 後方置換
! ループの外で X_N を計算して配列 F に格納
      CALL DTRSM('L', 'U', 'N', 'N', NB, NRHS, 1D0, D(1,(N-1)*NB+1), NB, F((N-1)*NB+1,1), LDF)
! ループの外で X_N-1 を計算して配列 F に格納
      CALL DGEMM('N', 'N', NB, NRHS, NB, -1D0, DU1(1,(N-2)*NB +1), NB, F((N-1)*NB+1,1), LDF, 1D0,
     +    F((N-2)*NB+1,1), LDF)
      CALL DTRSM('L', 'U', 'N', 'N', NB, NRHS, 1D0, D(1,(N-2)*NB+1), NB, F((N-2)*NB+1,1), LDF)
! ループ内で配列 F に格納されているコンポーネント X_K を計算
      DO K = N-2, 1, -1
          CALL DGEMM('N','N',NB, NRHS, NB, -1D0, DU1(1,(K-1)*NB +1), NB, F(K*NB+1,1), LDF, 1D0,
     +        F((K-1)*NB+1,1), LDF)
          CALL DGEMM('N','N',NB, NRHS, NB, -1D0, DU2(1,(K-1)*NB +1), NB, F((K+1)*NB+1,1), LDF, 1D0,
     +        F((K-1)*NB+1,1), LDF)
          CALL DTRSM('L', 'U', 'N', 'N', NB, NRHS, 1D0, D(1,(K-1)*NB+1), NB, F((K-1)*NB+1,1), LDF)
      END DO…

使用するルーチン

タスク

ルーチン

説明

ピボット置換を適用する

dswap

ベクトルを別のベクトルとスワップします。

下三角係数行列と上三角係数行列を用いて連立線形方程式を解く

dtrsm

三角行列を含む方程式を解きます。

右辺のブロックを更新する

dgemm

一般的な行列を含む行列-行列の積を計算します。

説明

サイズ NB x NBのブロックの一般的なブロック三重対角行列は、帯域幅 4*NB-1 の帯行列として扱い、インテル® MKL LAPACK の帯行列を因数分解するサブルーチン (?gbtrf) と、帯行列を解くサブルーチン (?gbtrs) を呼び出して解くことができます。 しかし、ブロック行列を帯行列として格納すると、帯の多くのゼロ要素が非ゼロとして扱われて計算中に処理されるため、この手法で説明したアプローチで必要な浮動小数点計算は少なくなります。より大きな NBでは影響も大きくなります。 帯行列をブロック三重対角行列として扱うこともできますが、ブロックに非ゼロとして扱われる多くのゼロが含まれるため、この格納手法は効率的ではありません。このため、帯格納手法とブロック三重対角格納手法、およびそれらのソルバーは、互いに補完的な手法として考えるべきです。

次の連立線形方程式について考えます。

ブロック三重対角係数行列 Aは次のように因数分解されると仮定します。

使用している用語の定義は、「一般的なブロック三重対角行列の因数分解」を参照してください。

式は 2 つの連立線形方程式に分解されます。

2 つ目の方程式を拡張します。

Y1を見つけるには、最初に置換 Π1Tを適用する必要があります。 この置換は、右辺の最初の 2 つのブロックのみ変更します。

置換をローカルに適用します。

Y1が見つかりました。

Y1を見つけた後、同様の計算を繰り返してほかの値 (Y2, Y3, ..., YN - 2) を見つけます。

ΠN - 1の異なる構造 (「一般的なブロック三重対角行列の因数分解」を参照) は、同じ方程式を YN - 1YNの計算に使用できず、ループの外で計算する必要があることを意味します。

Yを見つける方程式を用いるアルゴリズムは次のとおりです。

UX = Y方程式は次のように表現できます。

これらの方程式を解くアルゴリズムは次のとおりです。

English

文献目録 (英語)

$
0
0

インテル® マス・カーネル・ライブラリー・クックブック

スパースソルバー

[Amos10]

Ron Amos. Lecture 3: Solving Equations Using Fixed Point Iterations, University of Wisconsin CS412: Introduction to Numerical Analysis, 2010 (http://pages.cs.wisc.edu/~holzer/cs412/lecture03.pdf).

[Smith86]

G.D. Smith. Numerical Solution of Partial Differential Equations: Finite Difference Methods, Oxford Applied Mathematics & Computing Science Series, Oxford University Press, USA, 1986.

数値演算

[Zhang12]

Zhang Zhang, Andrey Nikolaev, and Victoriya Kardakova. Optimizing Correlation Analysis of Financial Market Data Streams Using Intel® Math Kernel Library, Intel Parallel Universe Magazine, 12: 42-48, 2012.

[Kargupta02]

Hillol Kargupta, Krishnamoorthy Sivakumar, and Samiran Ghost. A Random Matrix-Based Approach for Dependency Detection from Data Streams, Proceedings of Principles of Data Mining and Knowledge Discovery, PKDD 2002: 250-262. Springer, New York, 2002.

English

MKL running in parallel on a server with two processors

$
0
0

Hello everyone.

I have one server with two processors. Every processor has 16 cores. I found that MKL routines such as DGEMM can only run on one processor. I want that the MKL can employ all computing resource, i.e. two processors and 32 cores. How can I deal with this problem? 

Thank you very much!

MKL running in parallel on a server with two processors

$
0
0

Hello everyone.

I have one server with two processors. Every processor has 16 cores. I found that MKL routines such as DGEMM can only run on one processor. I want that the MKL can employ all computing resource, i.e. two processors and 32 cores. How can I deal with this problem? 

Thank you very much!

MKL running in parallel on a server with two processors

$
0
0

Hello everyone.

I have one server with two processors. Every processor has 16 cores. I found that MKL routines such as DGEMM can only run on one processor. I want that the MKL can employ all computing resource, i.e. two processors and 32 cores. How can I deal with this problem? 

Thank you very much!

MKL ERROR: Parameter 8 was incorrect on entry to DSYEV

$
0
0

Hi, I'm trying to use the lapack and blas libraries. I tried a simple example program that uses DSYEV. I get the following error on running the program: MKL ERROR: Parameter 8 was incorrect on entry to DSYEV. I issued the following commands to load the modules and compiled them.

>module load intel/2012.0.032

>ifort -O3  example.f -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential  -lpthread -lm

I would appreciate any help in solving this problem.

thanks,

charu.

--------------------------------------------------------------

The example.f program is :

       program main

       implicit none
*     .. Parameters ..
      INTEGER          N
      PARAMETER        ( N = 5 )
      INTEGER          LDA
      PARAMETER        ( LDA = N )
      INTEGER          LWMAX
      PARAMETER        ( LWMAX = 1000 )
*
*     .. Local Scalars ..
      INTEGER          INFO, LWORK
      integer :: i, j
*
*     .. Local Arrays ..
      DOUBLE PRECISION A( LDA, N ), W( N ), WORK( LWMAX )
      DATA             A/
     $  1.96, -6.49, -0.47, -7.20, -0.65,
     $ -6.49,  3.80, -6.39,  1.50, -6.34,
     $ -0.47, -6.39,  4.17, -1.51,  2.67,
     $ -7.20,  1.50, -1.51,  5.70,  1.80,
     $ -0.65, -6.34,  2.67,  1.80, -7.10
     $                  /
*
*     .. External Subroutines ..
      EXTERNAL         DSYEV
      EXTERNAL         PRINT_MATRIX
*
*     .. Intrinsic Functions ..
      INTRINSIC        INT, MIN
*
*     .. Executable Statements ..
      WRITE(*,*)'DSYEV Example Program Results'

      LWORK = LWMAX
      CALL DSYEV( 'Vectors', 'Upper', N, A, LDA, W, WORK, LWORK, INFO )

      IF( INFO.GT.0 ) THEN
         WRITE(*,*)'The algorithm failed to compute eigenvalues.'
         STOP
      END IF
*
*     Print eigenvalues.
*
      CALL PRINT_MATRIX( 'Eigenvalues', 1, N, W, 1 )
*
*     Print eigenvectors.
*
      CALL PRINT_MATRIX( 'Eigenvectors (stored columnwise)', N, N, A,
     $                   LDA )

      STOP
      END

*     End of DSYEV Example.
*
*  =============================================================================
*
*     Auxiliary routine: printing a matrix.
*
      SUBROUTINE PRINT_MATRIX( DESC, M, N, A, LDA )
      CHARACTER*(*)    DESC
      INTEGER          M, N, LDA
      DOUBLE PRECISION A( LDA, * )
*
      INTEGER          I, J
*
      WRITE(*,*)
      WRITE(*,*) DESC
      DO I = 1, M
         WRITE(*,9998) ( A( I, J ), J = 1, N )
      END DO
*
 9998 FORMAT( 11(:,1X,F6.2) )
      RETURN
      END

 

Solving a system of linear equations with an LU-factored block tridiagonal coefficient matrix

$
0
0

Goal

Use Intel MKL LAPACK routines to craft a solution to a system of equations involving a block tridiagonal matrix, since LAPACK does not have routines that directly solve systems with block tridiagonal matrices.

Solution

Intel MKL LAPACK provides a wide range of subroutines for solving systems of linear equations with an LU-factored coefficient matrix. It covers dense matrices, band matrices and tridiagonal matrices. This recipe extends this set to block tridiagonal matrices subject to condition all the blocks are square and have the same order. A block triangular matrix A has the form

Solving a system AX=F with an LU-factored matrix A=LU and multiple right hand sides (RHS) consists of two stages (see Factoring Block Tridiagonal Matrices for LU factorization).

  1. Forward substitution, which consists of solving a system of equations LY=F with pivoting, where L is a lower triangular coefficient matrix. For factored block tridiagonal matrices, all blocks of Y except the last one can be found in a loop which consists of

    1. Applying pivoting permutations locally to the right hand side.

    2. Solving the local system of NB linear equations with a lower triangular coefficient matrix, where NB is the order of the blocks.

    3. Updating the right hand side for the next step.

    The last two block components are found outside of the loop because of the structure of the final pivoting (two block permutations need to be applied consecutively) and the structure of the coefficient matrix.

  2. Backward substitution, which consists of solving the system UX=Y. This step is simpler because it does not involve pivoting. The procedure is similar to the first step:

    1. Solving systems with triangular coefficient matrices.

    2. Updating right hand side blocks.

    The difference from the previous step is that the coefficient matrix is upper, not lower, triangular, and the direction of the loop is reversed.

Source code: see the BlockTDS_GE/source/dgeblttrs.f file in the samples archive available at http://software.intel.com/en-us/mkl_cookbook_samples.

Forward Substitution

! Forward substitution
! In the loop compute components Y_K stored in array F
      DO K = 1, N-2
          DO I = 1, NB
              IF (IPIV(I,K) .NE. I) THEN
                  CALL DSWAP(NRHS, F((K-1)*NB+I,1), LDF, F((K-1)*NB+IPIV(I,K),1), LDF)
              END IF
          END DO
          CALL DTRSM('L', 'L', 'N', 'U', NB, NRHS, 1D0, D(1,(K-1)*NB+1), NB, F((K-1)*NB+1,1), LDF)
          CALL DGEMM('N', 'N', NB, NRHS, NB, -1D0, DL(1,(K-1)*NB+1), NB, F((K-1)*NB+1,1), LDF, 1D0,
     +        F(K*NB+1,1), LDF)
      END DO

! Apply two last pivots
      DO I = 1, NB
           IF (IPIV(I,N-1) .NE. I) THEN
               CALL DSWAP(NRHS, F((N-2)*NB+I,1), LDF, F((N-2)*NB+IPIV(I,N-1),1), LDF)
           END IF
      END DO

      DO I = 1, NB
           IF(IPIV(I,N)-NB.NE.I)THEN
               CALL DSWAP(NRHS, F((N-1)*NB+I,1), LDF, F((N-2)*NB+IPIV(I,N),1), LDF)
           END IF
      END DO
! Computing Y_N-1 and Y_N out of loop and store in array F
      CALL DTRSM('L', 'L', 'N', 'U', NB, NRHS, 1D0, D(1,(N-2)*NB+1), NB, F((N-2)*NB+1,1), LDF)
      CALL DGEMM('N', 'N', NB, NRHS, NB, -1D0, DL(1,(N-2)*NB +1), NB, F((N-2)*NB+1,1), LDF, 1D0,
     +     F((N-1)*NB+1,1), LDF)

Backward Substitution

…
! Backward substitution
! Computing X_N out of loop and store in array F
      CALL DTRSM('L', 'U', 'N', 'N', NB, NRHS, 1D0, D(1,(N-1)*NB+1), NB, F((N-1)*NB+1,1), LDF)
! Computing X_N-1 out of loop and store in array F
      CALL DGEMM('N', 'N', NB, NRHS, NB, -1D0, DU1(1,(N-2)*NB +1), NB, F((N-1)*NB+1,1), LDF, 1D0,
     +    F((N-2)*NB+1,1), LDF)
      CALL DTRSM('L', 'U', 'N', 'N', NB, NRHS, 1D0, D(1,(N-2)*NB+1), NB, F((N-2)*NB+1,1), LDF)
! In the loop computing components X_K stored in array F
      DO K = N-2, 1, -1
          CALL DGEMM('N','N',NB, NRHS, NB, -1D0, DU1(1,(K-1)*NB +1), NB, F(K*NB+1,1), LDF, 1D0,
     +        F((K-1)*NB+1,1), LDF)
          CALL DGEMM('N','N',NB, NRHS, NB, -1D0, DU2(1,(K-1)*NB +1), NB, F((K+1)*NB+1,1), LDF, 1D0,
     +        F((K-1)*NB+1,1), LDF)
          CALL DTRSM('L', 'U', 'N', 'N', NB, NRHS, 1D0, D(1,(K-1)*NB+1), NB, F((K-1)*NB+1,1), LDF)
      END DO…

Routines Used

Task

Routine

Description

Apply pivoting permutations

dswap

Swap a vector with another vector

Solve a system of linear equations with lower and upper triangular coefficient matrices

dtrsm

Solve a triangular matrix equation

Update the right hand side blocks

dgemm

Compute a matrix-matrix product with general matrices.

Discussion

Note

A general block tridiagonal matrix with blocks of size NB by NB can be treated as a band matrix with bandwidth 4*NB-1 and solved by calling Intel MKL LAPACK subroutines for factoring and solving band matrices (?gbtrf and ?gbtrs). But using the approach described in this recipe requires fewer floating point computations because if the block matrix is stored as a band matrix, many zero elements would be treated as nonzeros in the band and would be processed during computations. The effect increases for bigger NB. Analogously, band matrices can also be treated as block tridiagonal matrices. But this storage scheme is also not very efficient because the blocks would contain many zeros treated as nonzeros. So band storage schemes and block tridiagonal storage schemes and their respective solvers should be considered as complementary to each other.

Given a system of linear equations:

The block tridiagonal coefficient matrix A is assumed to be factored as shown:

See Factoring Block Tridiagonal Matrices for a definition of the terms used.

The system is decomposed into two systems of linear equations:

The second equation can be expanded:

In order to find Y1, first the permutation Π1T must be applied. This permutation only changes the first two blocks of the right hand side:

Applying the permutation locally gives

Now Y1 can be found:

After finding Y1, similar computations can be repeated to find Y2, Y3, ..., and YN - 2.

Note

The different structure of ΠN - 1 (see Factoring Block Tridiagonal) means that the same equations cannot be used to compute YN - 1 and YN and that they must be computed outside of the loop.

The algorithm to use the equations to find Y is:

The UX = Y equations can be represented as

The algorithm for solving these equations is:

English

Intel® Math Kernel Library Cookbook

$
0
0

Document Number: 330244-004US

The Intel® Math Kernel Library (Intel® MKL) contains many routines to help you solve various numerical problems, such as multiplying matrices, solving a system of equations, and performing a Fourier transform. While many problems do not have dedicated Intel MKL routines, you can solve them by assembling the building blocks provided by Intel MKL.

The Intel Math Kernel Library Cookbook includes these recipes to help you to assemble Intel MKL routines for solving some more complex problems:

Note

Code examples in the cookbook are provided in Fortran for some recipes and in C for other recipes.

English

Noise filtering in financial market data streams

$
0
0

Goal

Detect how the price movements of some stocks influences the price movements of others in a large stock portfolio.

Solution

Split a correlation matrix representing the overall dependencies in data into two components, a signal matrix and a noise matrix. The signal matrix gives an accurate estimate of dependencies between stocks. The algorithm ([Zhang12],[Kargupta02]) relies on an eigenstate-based approach that separates noise from useful information by considering the eigenvalues of the correlation matrix for the accumulated data.

Intel MKL Summary Statistics provides functions to calculate correlation matrix for streaming data. Intel MKL LAPACK contains a set of computational routines to compute eigenvalues and eigenvectors for symmetric matrices of various properties and storage formats.

The online noise filtering algorithm is:

  1. Compute λmin and λmax, the boundaries of the interval of the noise eigenstates.

  2. Get a new block of data from the data stream.

  3. Update the correlation matrix using the latest data block.

  4. Compute the eigenvalues and eigenvectors that define the noise component, by searching the eigenvalues of the correlation matrix belonging to the interval [λmin, λmax].

  5. Compute the correlation matrix of the noise component by combining the eigenvalues and eigenvectors computed in Step 4.

  6. Compute the correlation matrix of the signal component by subtracting the noise component from the overall correlation matrix. If there is more data, go back to Step 2.

Source code: see the nf folder in the samples archive available at http://software.intel.com/en-us/mkl_cookbook_samples.

Initialization

Initialize a correlation analysis task and its parameters.

VSLSSTaskPtr task;
double *x, *mean, *cor;
double W[2];
MKL_INT x_storage, cor_storage;

...

scanf("%d", &m);            // number of observations in block
scanf("%d", &n);            // number of stocks (task dimension)

...

/* Allocate memory */
nfAllocate(m, n, &x, &mean, &cor, ...);


/* Initialize Summary Statistics task structure */
nfInitSSTask(&m, &n, &task, x, &x_storage, mean, cor, &cor_storage, W);
...

/* Allocate memory */
void nfAllocate(MKL_INT m, MKL_INT n, double **x, double **mean, double **cor,
               ...)
{
    *x = (double *)mkl_malloc(m*n*sizeof(double), ALIGN);
    CheckMalloc(*x);

    *mean = (double *)mkl_malloc(n*sizeof(double), ALIGN);
    CheckMalloc(*mean);

    *cor = (double *)mkl_malloc(n*n*sizeof(double), ALIGN);
    CheckMalloc(*cor);
...
}
/* Initialize Summary Statistics task structure */
void nfInitSSTask(MKL_INT *m, MKL_INT *n, VSLSSTaskPtr *task, double *x,
                  MKL_INT *x_storage, double *mean, double *cor,
                  MKL_INT *cor_storage, double *W)
{
    int status;

    /* Create VSL Summary Statistics task */
    *x_storage = VSL_SS_MATRIX_STORAGE_COLS;
    status = vsldSSNewTask(task, n, m, x_storage, x, 0, 0);
    CheckSSError(status);

    /* Register array of weights in the task */
    W[0] = 0.0;
    W[1] = 0.0;
    status = vsldSSEditTask(*task, VSL_SS_ED_ACCUM_WEIGHT, W);
    CheckSSError(status);

    /* Initialization of the task parameters using full storage
       for correlation matrix computation */
    *cor_storage = VSL_SS_MATRIX_STORAGE_FULL;
    status = vsldSSEditCovCor(*task, mean, 0, 0, cor, cor_storage);
    CheckSSError(status);
}

Computation

Perform noise filtering steps for each block of data.

/* Set threshold that define noise component */
sqrt_n_m = sqrt((double)n / (double)m);
lambda_min = (1.0 - sqrt_n_m) * (1.0 - sqrt_n_m);
lambda_max = (1.0 + sqrt_n_m) * (1.0 + sqrt_n_m);

...
/* Loop over data blocks */
for (i = 0; i < n_block; i++)
{
    /* Read next portion of data */
    nfReadDataBlock(m, n, x, fh);

    /* Update "signal" and "noise" covariance estimates */
    nfKernel(m, n, lambda_min, lambda_max, x, cor, cor_copy,
             task, eval, evect, work, iwork, isuppz,
             cov_signal, cov_noise);
}
...

void nfKernel(…)
{
...

    /* Update correlation matrix estimate using FAST method */
    errcode = vsldSSCompute(task, VSL_SS_COR, VSL_SS_METHOD_FAST);
    CheckSSError(errcode);

...
    /* Compute eigenvalues and eigenvectors of the correlation matrix */
    dsyevr(&jobz, &range, &uplo, &n, cor_copy, &n, &lmin, &lmax,&imin, &imax, &abstol, &n_noise, eval, evect, &n, isuppz,
           work, &lwork, iwork, &liwork, &info);

    /* Calculate "signal" and "noise" part of covariance matrix */
    nfCalculateSignalNoiseCov(n, n_signal, n_noise,
        eval, evect, cor, cov_signal, cov_noise);
}

...
static int nfCalculateSignalNoiseCov(int n, int n_signal, int n_noise,
        double *eval, double *evect, double *cor, double *cov_signal,
        double *cov_noise)
{
    int i, j, nn;

    /* SYRK parameters */
    char uplo, trans;
    double alpha, beta;

    /* Calculate "noise" part of covariance matrix. */
    for (j = 0; j < n_noise; j++) eval[j] = sqrt(eval[j]);

    for (i = 0; i < n_noise; i++)
        for (j = 0; j < n; j++)
            evect[i*n + j] *= lambda[i];

    uplo  = 'U';
    trans = 'N';
    alpha = 1.0;
    beta  = 0.0;
    nn = n;

    if (n_noise > 0)
    {
        dsyrk(&uplo, &trans, &nn, &n_noise,  &alpha, evect, &nn,&beta, cov_noise, &nn);
    }
    else
    {
        for (i = 0; i < n*n; i++) cov_noise[i] = 0.0;
    }

    /* Calculate "signal" part of covariance matrix. */
    if (n_signal > 0)
    {
        for (i = 0; i < n; i++)
            for (j = 0; j <= i; j++)
                cov_signal[i*n + j] = cor[i*n + j] - cov_noise[i*n + j];
    }
    else
    {
        for (i = 0; i < n*n; i++) cov_signal[i] = 0.0;
    }

    return 0;
}

Deinitialization

Delete the task and release associated resources.

errcode = vslSSDeleteTask(task);
CheckSSError(errcode);
MKL_Free_Buffers();

Routines Used

Task

Routine

Description

Initialize a summary statistics task and define the objects for analysis: dataset, its sizes (number of variables and number of observations), and the storage format.

vsldSSNewTask

Creates and initializes a new summary statistics task descriptor.

Specify the memory to hold the correlation matrix.

vsldSSEditCovCor

Modifies the pointers to covariance/correlation/cross-product parameters.

Specify the two-element array intended to hold accumulated weights of observations processed so far (necessary for correct computation of estimates for data streams)

vsldSSEditTask

Modifies address of an input/output parameter in the task descriptor.

Call the major compute driver by specifying computation type VSL_SS_COR, and computation method, VSL_SS_METHOD_FAST.

vsldSSCompute

Computes Summary Statistics estimates.

De-allocate resources associated with the task.

vslSSDeleteTask

Destroys the task object and releases the memory.

Compute eigenvalues and eigenvectors of the correlation matrix.

dsyevr

Computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix using the Relatively Robust Representations.

Perform a symmetric rank-k update.

dsyrk

Performs a symmetric rank-k update.

Discussion

Step 4 of the algorithm involves solving an eigenvalue problem for a symmetric matrix. The online noise filtration algorithm requires computation of eigenvalues that belong to the predefined interval [λmin, λmax ], which define noise in the data. The LAPACK driver routine ?syevr is the default routine for solving this type of problem. The ?syevr interface allows the caller to specify a pair of values, in this case corresponding to λmin and λmax, as the lower and upper bounds of the interval to be searched for eigenvalues.

The eigenvectors found are returned as columns of an array containing an orthogonal matrix A, and eigenvalues are returned in an array containing elements of the diagonal matrix Diag. The correlation matrix for the noise component can be obtained by computing A*Diag*AT. However, instead of constructing a noise correlation matrix using two general matrix multiplications, this can be more efficiently computed with one diagonal matrix multiplication and one rank-n update operation:

For the rank-n update operation, Intel MKL provides the BLAS function ?syrk.

English

Using Fast Fourier Transforms for computer tomography image reconstruction

$
0
0

Goal

Reconstruct the original image from the Computer Tomography (CT) data using fast Fourier transform (FFT) functions.

Solution

Notation:

  • Specification of index ranges adopts the notation used in MATLAB*.

    For example: k=-q : q means k=-q, -q+1, -q+2,…, q-1, q.

  • While f(x) means the value of the function f at point x, f[n] means the value of nth element of the discrete data set f.

Assumptions:

  • The density f(x, y) of a two-dimensional (2D) image vanishes outside the unit circle:

    f = 0 when x2 + y2> 1.

  • The CT data consists of p projections of the image taken at angles θj = jπ/p, where j = 0 : p - 1.

  • Each projection contains 2q + 1 density values g[j, l] = g(θj , sl) approximating the integral of the image along the line

    (x, y) = (-t sinθj+ sl cosθj , t cosθj+ sl sinθj),

    where l = -q : q, sl= l /q, and t is the integration parameter.

The discrete image reconstruction algorithm consists of the following steps:

  1. Evaluate p one-dimensional (1D) Fourier transforms (for j = 0 : p - 1 and r = -q : q):

  2. Interpolate g1[j, r] from radial grid (πr/q)(cosθj , sinθj) onto Cartesian grid (ξ, η) = (-q : q, -q : q), obtaining f2(πξ/q, πη/q).
  3. Evaluate one inverse two-dimensional complex-to-complex FFT to obtain a complex-valued reconstruction f1 of the image:

    where f(m/q, n/q) f1[m, n] for m = -q : q and n = -q : q.

Computations in steps 1 and 3 call Intel MKL FFT interfaces. Computations in step 2 implement a simple version of interpolation tailored to the data layout used by Intel MKL FFT.

Reconstructing the original CT image in C/C++

// Declarations
int Nq = 2*(q+1); // space for in-place r2c FFT
void    *gmem = mkl_malloc( sizeof(float)*p*Nq, 64 );
float    *g = gmem; // g[j*Nq + ell+q]
complex *g1 = gmem; // g1[j*Nq/2 + r+q]

// Initialize g with the CT data
for (int j = 0; j < p; ++j)
for (int ell = 0; ell < 2*q+1; ++ell) {
  g[j*Nq + ell+q] = get_g(theta_j,s_ell);
}

// Step 1: Configure and compute 1D real-to-complex FFTs
DFTI_DESCRIPTOR_HANDLE h1 = NULL;
DftiCreateDescriptor(&h1,DFTI_SINGLE,DFTI_REAL,1,(MKL_LONG)2*q);
DftiSetValue(h1,DFTI_CONJUGATE_EVEN_STORAGE,DFTI_COMPLEX_COMPLEX);
DftiSetValue(h1,DFTI_NUMBER_OF_TRANSFORMS,(MKL_LONG)p);
DftiSetValue(h1,DFTI_INPUT_DISTANCE,(MKL_LONG)Nq);
DftiSetValue(h1,DFTI_OUTPUT_DISTANCE,(MKL_LONG)Nq/2);
DftiSetValue(h1,DFTI_FORWARD_SCALE,fscale);
DftiCommitDescriptor(h1);
DftiComputeForward(h1,g); // now gmem contains g1

// Step 2: Interpolate g1 to f2 - omitted here
complex *f = mkl_malloc( sizeof(complex) * 2*q * 2*q, 64 );

// Step 3: Configure and compute 2D complex-to-complex FFT
DFTI_DESCRIPTOR_HANDLE h3 = NULL;
MKL_LONG sizes[2] = {2*q, 2*q};
DftiCreateDescriptor(&h3,DFTI_SINGLE,DFTI_COMPLEX,2,sizes);
DftiCommitDescriptor(h3);
DftiComputeBackward(h3,f); // now f is complex-valued reconstruction

Source code, image file, and makefiles: see the fft-ct folder in the samples archive available at http://software.intel.com/en-us/mkl_cookbook_samples.

Discussion

The code first configures the Intel MKL FFT descriptor for computing a batch of the one-dimensional Fourier transforms in a single call to the DftiComputeForward function and then computes the batch transform. The distance for the multiple transforms is set in terms of elements of the corresponding domain (real on input and complex on output). The transforms are in-place by default.

To have a smaller memory footprint, the FFT is computed in place, that is, the result of the computation overwrites the input data. With an in-place real-to-complex FFT the input array reserves extra space because the result of the FFT takes slightly more memory than the input.

On input to step 1, array g contains p x (2q+1) real-valued data elements g(θj, sl). The same memory on output of this step contains p x (q + 1) complex-valued output elements g1(θj, πr/q). The complex-conjugate part of the result is not stored, and therefore array g1 refers to only q + 1 values of r.

To interpolate from g1 to f2, an additional array f is allocated to store complex-valued data f2(ξ, η) and complex-valued output f1(x, y) of inverse FFT in step 3. The interpolation step does not call Intel MKL functions, but you can find its C++ implementation in the function step2_interpolation of the source code for this recipe (file main.cpp). The simplest implementation of interpolation is:

  • For every (ξ, η) inside the unit circle, find the closest (θj , πr/q) and use the value of g1(θj , πr/q) for f2.

  • For every (ξ, η) outside the unit circle, set f2 to 0.
  • In the case of (ξ, η) corresponding to the interval -π< θj< 0, use conjugate even property of the result of a real-to-complex transform: g1(θ, ω)=conj(g(-θ, -ω)).

Notice that the FFT in step 1 is applied to the data offset by half the representation interval, which causes the computed output be multiplied by ei(πr/q)q= (-1)r. Instead of correcting this in a separate pass, the interpolation takes the multiplier into account.

Similarly, the 2D FFT in step 3 produces an output that shifts the center of the image to the corner, and step 2 prevents this by phase shifting the input to step 3.

Step 3 computes the two-dimensional (2q) x (2q) complex-to-complex FFT on the interpolated data contained in array f. This computation is followed by a conversion of the complex-valued image f1 to a visual picture. You can find a complete C++ program that implements the CT image reconstruction in the source code for this recipe (file main.cpp).

English

Evaluating a Fourier integral

$
0
0

Goal

Use a fast Fourier transform (FFT) to numerically evaluate the continuous Fourier transform integral

Solution

Let’s assume that the real-valued function f(x) is zero outside the interval [a, b] and is sampled at N equidistant points xn = a + nT/N, where T = |b - a| and n = 0, 1, ... , N-1. An FFT will be used to evaluate the integral at points ξk = k2π/T, where k = 0, 1, ... , N/2.

Using Intel® Math Kernel Library FFT Interface in C/C++

float *f;   // input: f[n] = f(a + n*T/N), n=0...N-1
complex *F; // output: F[k] = F(2*k*PI/T), k=0...N/2
DFTI_DESCRIPTOR_HANDLE h = NULL;
DftiCreateDescriptor(&h,DFTI_SINGLE,DFTI_REAL,1,(MKL_LONG)N);
DftiSetValue(h,DFTI_CONJUGATE_EVEN_STORAGE,DFTI_COMPLEX_COMPLEX);
DftiSetValue(h,DFTI_PLACEMENT,DFTI_NOT_INPLACE);
DftiCommitDescriptor(h);
DftiComputeForward(h,f,F);
for (int k = 0; k <= N/2; ++k)
{
  F[k] *= (T/N)*complex( cos(2*PI*a*k/T), -sin(2*PI*a*k/T) );
}

Discussion

The evaluation follows this derivation, based on step-function approximation of the integral:

The sum in the last line is an FFT by definition. When the support of the function f extends symmetrically around zero, that is, [a, b] = [-T/2, T/2], the factor before the sum turns into (T/N)(-1)k.

When the function f is real-valued, F(ξk) = conj(F(ξN-k)). The first N/2 + 1 complex values of the real-to-complex FFT occupy approximately the same memory as the real input, and they suffice to compute the whole result by conjugation. If the FFT computation is configured to perform a real-to-complex transform, it also takes approximately half as much time as a complex-to-complex FFT.

English

Numpy+MKL install fails: Could not locate executable icc

$
0
0

I followed the build instruction on intel's website closely but when get to building numpy I get the following message:

sudo python setup.py config --compiler=intelem build_clib --compiler=intelem build_ext --compiler=intelem install
Running from numpy source directory.
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'test_suite'
  warnings.warn(msg)
non-existing path in 'numpy/f2py': 'docs'
non-existing path in 'numpy/f2py': 'f2py.1'
/bin/sh: 1: svnversion: not found
F2PY Version 2
blas_opt_info:
blas_mkl_info:
  FOUND:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/intel/composer_xe_2015/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/opt/intel/composer_xe_2015/mkl/include']

  FOUND:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/intel/composer_xe_2015/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/opt/intel/composer_xe_2015/mkl/include']

/bin/sh: 1: svnversion: not found
non-existing path in 'numpy/lib': 'benchmarks'
lapack_opt_info:
openblas_lapack_info:
  libraries openblas not found in ['/usr/local/lib', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
  NOT AVAILABLE

lapack_mkl_info:
mkl_info:
  FOUND:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/intel/composer_xe_2015/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/opt/intel/composer_xe_2015/mkl/include']

  FOUND:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/intel/composer_xe_2015/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/opt/intel/composer_xe_2015/mkl/include']

  FOUND:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/intel/composer_xe_2015/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/opt/intel/composer_xe_2015/mkl/include']

/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'define_macros'
  warnings.warn(msg)
running config
running build_clib
running build_src
build_src
building py_modules sources
building library "npymath" sources
Could not locate executable icc
Could not locate executable ecc
customize Gnu95FCompiler
Found executable /usr/bin/gfortran
customize Gnu95FCompiler
customize Gnu95FCompiler using config
C compiler: icc -m64 -fPIC

compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.7 -c'
icc: _configtest.c
sh: 1: icc: not found
sh: 1: icc: not found
failure.
removing: _configtest.c _configtest.o
Traceback (most recent call last):
  File "setup.py", line 251, in <module>
    setup_package()
  File "setup.py", line 243, in setup_package
    setup(**metadata)
  File "/home/matus/Desktop/numpy-1.9.1/numpy/distutils/core.py", line 169, in setup
    return old_setup(**new_attr)
  File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
    dist.run_commands()
  File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "/home/matus/Desktop/numpy-1.9.1/numpy/distutils/command/build_clib.py", line 63, in run
    self.run_command('build_src')
  File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "/home/matus/Desktop/numpy-1.9.1/numpy/distutils/command/build_src.py", line 153, in run
    self.build_sources()
  File "/home/matus/Desktop/numpy-1.9.1/numpy/distutils/command/build_src.py", line 164, in build_sources
    self.build_library_sources(*libname_info)
  File "/home/matus/Desktop/numpy-1.9.1/numpy/distutils/command/build_src.py", line 299, in build_library_sources
    sources = self.generate_sources(sources, (lib_name, build_info))
  File "/home/matus/Desktop/numpy-1.9.1/numpy/distutils/command/build_src.py", line 386, in generate_sources
    source = func(extension, build_dir)
  File "numpy/core/setup.py", line 686, in get_mathlib_info
    raise RuntimeError("Broken toolchain: cannot link a simple C program")
RuntimeError: Broken toolchain: cannot link a simple C program

However icc is available and indeed

$ icc --version

returns

icc (ICC) 15.0.0 20140723
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

as for ecc

$ icecc --version

ICECC 1.0.1

So what is the problem?

I'm on Linux Mint 17

3D Convolution by Using Intel MKL

$
0
0

There is an example for using Intel MKL to perform FFT in 3D, as posted here. It would be absolutely helpful to know how MKL FFT libraries can be used to compute 3D convolution (in a serial manner) for values of type float or double. Even a simple example would help to a great extend. Thanks in advance. 

Intel® Parallel Studio XE 2015 Update 1 Professional Edition for C++ Windows*

$
0
0

Intel® Parallel Studio XE 2015 Update 1 Professional Edition for C++ parallel software development suite combines Intel's C/C++ compiler; performance and parallel libraries; error checking, code robustness, and performance profiling tools into a single suite offering.  This new product release includes:

  • Intel® Parallel Studio XE 2015 Update 1 Composer Edition for C++ - includes Intel® C++ Compiler, Intel® Integrated Performance Primitives (Intel® IPP), Intel® Threading Building Blocks (Intel® TBB) and Intel® Math Kernel Library (Intel® MKL)
  • Intel® Advisor XE 2015 Update 1
  • Intel® Inspector XE 2015 Update 1
  • Intel® VTune™ Amplifier XE 2015 Update 1
  • Sample programs
  • Documentation

New in this release:

  • Components updated to current versions

Note:  For more information on the changes listed above, please read the individual component release notes.

 See the previous releases's ReadMe to see what was new in that release.

Resources

Contents 
File:  parallel_studio_xe_2015_update1_online_setup.exe
Online installer

File:  parallel_studio_xe_2015_update1_setup.exe
Product for developing 32-bit and 64-bit applications

  • Developers
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8
  • C/C++
  • Intel® C++ Compiler
  • Intel® C++ Composer XE
  • Intel® Composer XE
  • Intel® Integrated Performance Primitives
  • Intel® Math Kernel Library
  • Intel® Threading Building Blocks
  • Intel® C++ Studio XE
  • Intel® Advisor XE
  • Intel® VTune™ Amplifier XE
  • Intel® Inspector XE
  • URL
  • Theme Zone: 

    IDZone

    Intel® Parallel Studio XE 2015 Update 1 Professional Edition for Fortran Windows*

    $
    0
    0

    Intel® Parallel Studio XE 2015 Update 1 Professional Edition for Fortran parallel software development suite combines Intel's Fortran compiler; performance and parallel libraries; error checking, code robustness, and performance profiling tools into a single suite offering.  This new product release includes:

    • Intel® Parallel Studio XE 2015 Update 1 Composer Edition for Fortran - includes Intel® Visual Fortran Compiler and Intel® Math Kernel Library (Intel® MKL)
    • Intel® Advisor XE 2015 Update 1
    • Intel® Inspector XE 2015 Update 1
    • Intel® VTune™ Amplifier XE 2015 Update 1
    • Sample programs
    • Documentation

    New in this release:

    • Components updated to current versions

    Note:  For more information on the changes listed above, please read the individual component release notes.

     See the previous releases's ReadMe to see what was new in that release.

    Resources

    Contents 
    File:  parallel_studio_xe_2015_update1_online_setup.exe
    Online installer

    File:  parallel_studio_xe_2015_update1_setup.exe
    Product for developing 32-bit and 64-bit applications

  • Developers
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8
  • Fortran
  • Intel® Composer XE
  • Intel® Fortran Compiler
  • Intel® Fortran Composer XE
  • Intel® Visual Fortran Composer XE
  • Intel® Math Kernel Library
  • Intel® Fortran Studio XE
  • Intel® Advisor XE
  • Intel® VTune™ Amplifier XE
  • Intel® Inspector XE
  • URL
  • Theme Zone: 

    IDZone

    Intel® Parallel Studio XE 2015 Update 1 Professional Edition for Linux*

    $
    0
    0

    Intel® Parallel Studio XE 2015 Update 1 Professional Edition parallel software development suite combines Intel's C/C++ compiler and Fortran compiler; performance and parallel libraries; error checking, code robustness, and performance profiling tools into a single suite offering.  This new product release includes:

    • Intel® Parallel Studio XE 2015 Update 1 Composer Edition - includes Intel® Fortran Compiler, Intel® C++ Compiler, Intel® Integrated Performance Primitives (Intel® IPP), Intel® Threading Building Blocks (Intel® TBB), Intel® Math Kernel Library (Intel® MKL) and GNU* Project Debugger (GDB*)
    • Intel® Advisor XE 2015 Update 1
    • Intel® Inspector XE 2015 Update 1
    • Intel® VTune™ Amplifier XE 2015 Update 1
    • Sample programs
    • Documentation

    New in this release:

    • Components updated to current versions
    • Support for SuSE Linux Enterprise Server 12* has been added

    Note:  For more information on the changes listed above, please read the individual component release notes.

    See the previous releases's ReadMe to see what was new in that release.

    Resources

    Contents
    File:  parallel_studio_xe_2015_update1_online.sh
    Online installer

    File:  parallel_studio_xe_2015_update1.tgz
    Product for developing 32-bit and 64-bit applications

  • Developers
  • Linux*
  • C/C++
  • Fortran
  • Intel® C++ Compiler
  • Intel® C++ Composer XE
  • Intel® Composer XE
  • Intel® Fortran Compiler
  • Intel® Fortran Composer XE
  • Intel® Integrated Performance Primitives
  • Intel® Math Kernel Library
  • Intel® Threading Building Blocks
  • Intel® Parallel Studio XE
  • Intel® Parallel Studio XE Professional Edition
  • Intel® Advisor XE
  • Intel® VTune™ Amplifier XE
  • Intel® Inspector XE
  • URL
  • Theme Zone: 

    IDZone

    Intel® Parallel Studio XE 2015 Update 1 Professional Edition for C++ Linux*

    $
    0
    0

    Intel® Parallel Studio XE 2015 Update 1 Professional Edition for C++ parallel software development suite combines Intel's C/C++ compiler; performance and parallel libraries; error checking, code robustness, and performance profiling tools into a single suite offering.  This new product release includes:

    • Intel® Parallel Studio XE 2015 Update 1 Composer Edition for C++ - includes Intel® C++ Compiler, Intel® Integrated Performance Primitives (Intel® IPP), Intel® Threading Building Blocks (Intel® TBB), Intel® Math Kernel Library (Intel® MKL) and GNU* Project Debugger (GDB*)
    • Intel® Advisor XE 2015 Update 1
    • Intel® Inspector XE 2015 Update 1
    • Intel® VTune™ Amplifier XE 2015 Update 1
    • Sample programs
    • Documentation

    New in this release:

    • Components updated to current versions
    • Support for SuSE Linux Enterprise Server 12* has been added

    Note:  For more information on the changes listed above, please read the individual component release notes.

     See the previous releases's ReadMe to see what was new in that release.

    Resources

    Contents
    File:  parallel_studio_xe_2015_update1_online.sh
    Online installer

    File:  parallel_studio_xe_2015_update1.tgz
    Product for developing 32-bit and 64-bit applications

  • Developers
  • Linux*
  • C/C++
  • Intel® C++ Compiler
  • Intel® C++ Composer XE
  • Intel® Composer XE
  • Intel® Integrated Performance Primitives
  • Intel® Math Kernel Library
  • Intel® Threading Building Blocks
  • Intel® C++ Studio XE
  • Intel® Advisor XE
  • Intel® VTune™ Amplifier XE
  • Intel® Inspector XE
  • URL
  • Theme Zone: 

    IDZone

    Intel® Parallel Studio XE 2015 Update 1 Professional Edition for Fortran Linux*

    $
    0
    0

    Intel® Parallel Studio XE 2015 Update 1 Professional Edition for Fortran parallel software development suite combines Intel's Fortran compiler; performance and parallel libraries; error checking, code robustness, and performance profiling tools into a single suite offering.  This new product release includes:

    • Intel® Parallel Studio XE 2015 Update 1 Composer Edition for Fortran - includes Intel® Fortran Compiler, Intel® Math Kernel Library (Intel® MKL) and GNU* Project Debugger (GDB*)
    • Intel® Advisor XE 2015 Update 1
    • Intel® Inspector XE 2015 Update 1
    • Intel® VTune™ Amplifier XE 2015 Update 1
    • Sample programs
    • Documentation

    New in this release:

    • Components updated to current versions
    • Support for SuSE Linux Enterprise Server 12* has been added

    Note:  For more information on the changes listed above, please read the individual component release notes.

     See the previous releases's ReadMe to see what was new in that release.

    Resources

    Contents
    File:  parallel_studio_xe_2015_update1_online.sh
    Online installer

    File:  parallel_studio_xe_2015_update1.tgz
    Product for developing 32-bit and 64-bit applications

  • Developers
  • Linux*
  • Fortran
  • Intel® Composer XE
  • Intel® Fortran Compiler
  • Intel® Fortran Composer XE
  • Intel® Visual Fortran Composer XE
  • Intel® Math Kernel Library
  • Intel® Fortran Studio XE
  • Intel® Advisor XE
  • Intel® VTune™ Amplifier XE
  • Intel® Inspector XE
  • URL
  • Theme Zone: 

    IDZone

    1D FFT of a 3D array

    $
    0
    0

    My goal is to compute 1D FFT of a 3D array along all its dimensions. This 3D array is stored as a 1D array in a columnwise fashion. For example,

    for( int k = 0; k < nk; k++ ) // Loop through the height.
        for( int j = 0; j < nj; j++ ) // Loop through the rows.
            for( int i = 0; i < ni; i++ ) // Loop through the columns.
            {
                ijk = i + ni * j + ni * nj * k;
                my3Darray[ ijk ] = 1.0;
            }

    In fact, I need to pass all the rows/columns/height 1D vectors of `my3Darray` to FFT function of MKL library (by height, I mean the vectors in the third dimension off the array) and compute their Fourier transforms. Is there a possibility to utilize MKL Fourier libraries for such an 3D array that is stored as a 1D? I would like to perform the following operations coded by the example below. Please advise.

    Heightwise FFT:

    for( int i = 0; i < ni; i++ ) // Loop through the columns.
        for( int j = 0; j < nj; j++ ) // Loop through the rows.
        {
            for( int k = 0; k < nk; k++ ) // Loop through the heights.
            {
                ijk = i + ni * j + ni * nj * k;
                myvec[ k ] = my3Darray[ ijk ];
                fft_function( myvec, myvec_processed );
            }
    
            // Store the results in a new array, which is storing myvec_processed in my3Darray_fft_values.
            for( int k = 0; k < nk; k++ ) // Loop through the heights.
            {
                ijk = i + ni * j + ni * nj * k;
                my3Darray_fft_values[ ijk ] = myvec_processed[ k ];
            }
        }

     

    Columnwise FFT:

    for( int k = 0; k < nk; k++ ) // Loop through the height.
        for( int j = 0; j < nj; j++ ) // Loop through the rows.
        {
            for( int i = 0; i < ni; i++ ) // Loop through the columns.
            {
                ijk = i + ni * j + ni * nj * k;
                myvec[ i ] = my3Darray[ ijk ];
                fft_function( myvec, myvec_processed );
            }
    
            // Store the results in a new array, which is storing myvec_processed in my3Darray_fft_values.
            for( int i = 0; i < ni; i++ ) // Loop through the columns.
            {
                ijk = i + ni * j + ni * nj * k;
                my3Darray_fft_values[ ijk ] = myvec_processed[ i ];
            }
        }

     

    row-wise FFT:

    for( int i = 0; i < ni; i++ ) // Loop through the column.
        for( int k = 0; k < nk; k++ ) // Loop through the height.
        {
            for( int j = 0; j < nj; j++ ) // Loop through the row.
            {
                ijk = i + ni * j + ni * nj * k;
                myvec[ j ] = my3Darray[ ijk ];
                fft_function( myvec, myvec_processed );
            }
    
            // Store the results in a new array, which is storing myvec_processed in my3Darray_fft_values.
            for( int j = 0; j < nj; j++ ) // Loop through the row.
            {
                ijk = i + ni * j + ni * nj * k;
                my3Darray_fft_values[ ijk ] = myvec_processed[ j ];
            }
        }

     

    Viewing all 1435 articles
    Browse latest View live


    <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>