PETSc - MatMultScale? Matrix X vector X scalar

I'm using PETSc and I wanted to do something like,

Equation

I know I can do:

Mat A
Vec x,y

MatMult(A,x,y)
VecScale(y,0.5)

I was just curious if there is a function that would do all of these in one shot. It seems like it would save a loop.

MatMultScale(A,x,0.5,y)

Does such a function exist?

Solution

This function (or anything close) does not seems to be in the list of functions operating on Mat. So a brief answer to your question would be...no.

If you often use $y=\frac12 Ax$, a solution would be to scale the matrix once for all, using MatScale(A,0.5);.

Would such a function be useful ? One way to check this is to use the -log_summary option of petsc, to get some profiling information. If your matrix is dense, you will see that the time spent in MatMult() is much larger than the time spent in VecScale(). This question is meaningful only if a sparce matrix is handled, with a few non-null terms per line.

Here is a code to test it, using 2xIdentity as the matrix :

static char help[] = "Tests solving linear system on 0 by 0 matrix.\n\n";

#include <petscksp.h>

#undef __FUNCT__
#define __FUNCT__ "main"
int main(int argc,char **args)
{
    Vec            x, y;      
    Mat            A;           
    PetscReal      alpha=0.5;  
    PetscErrorCode ierr;
    PetscInt n=42;

    PetscInitialize(&argc,&args,(char*)0,help);
    ierr = PetscOptionsGetInt(NULL,"-n",&n,NULL);CHKERRQ(ierr);

    /* Create the vector*/
    ierr = VecCreate(PETSC_COMM_WORLD,&x);CHKERRQ(ierr);
    ierr = VecSetSizes(x,PETSC_DECIDE,n);CHKERRQ(ierr);
    ierr = VecSetFromOptions(x);CHKERRQ(ierr);
    ierr = VecDuplicate(x,&y);CHKERRQ(ierr);

    /*
     Create matrix.  When using MatCreate(), the matrix format can
     be specified at runtime.

     Performance tuning note:  For problems of substantial size,
     preallocation of matrix memory is crucial for attaining good
     performance. See the matrix chapter of the users manual for details.
     */
    ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr);
    ierr = MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,n,n);CHKERRQ(ierr);
    ierr = MatSetFromOptions(A);CHKERRQ(ierr);
    ierr = MatSetUp(A);CHKERRQ(ierr);

    /*
This matrix is diagonal, two times identity
should have preallocated, shame
     */
    PetscInt i,col;
    PetscScalar value=2.0;
    for (i=0; i<n; i++) {
        col=i;
        ierr   = MatSetValues(A,1,&i,1,&col,&value,INSERT_VALUES);CHKERRQ(ierr);
    }

    ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
    ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);

    /*
     let's do this 42 times for nothing :
     */
    for(i=0;i<42;i++){
        ierr = MatMult(A,x,y);CHKERRQ(ierr);
        ierr = VecScale(y,alpha);CHKERRQ(ierr);
    }

    ierr = VecDestroy(&x);CHKERRQ(ierr);
    ierr = VecDestroy(&y);CHKERRQ(ierr); 
    ierr = MatDestroy(&A);CHKERRQ(ierr);

    ierr = PetscFinalize();
    return 0;
}

The makefile :

include ${PETSC_DIR}/conf/variables
include ${PETSC_DIR}/conf/rules
include ${PETSC_DIR}/conf/test

CLINKER=g++

all : ex1

ex1 : main.o chkopts
    ${CLINKER} -w -o main main.o ${PETSC_LIB}
    ${RM} main.o

run :
    mpirun -np 2 main -n 10000000 -log_summary -help -mat_type mpiaij

And here the resulting two lines of -log_summary that could answer your question :

Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecScale              42 1.0 1.0709e+00 1.0 2.10e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4 50  0  0  0   4 50  0  0  0   392
MatMult               42 1.0 5.7360e+00 1.1 2.10e+08 1.0 0.0e+00 0.0e+00 0.0e+00 20 50  0  0  0  20 50  0  0  0    73

So the 42 VecScale() operations took 1 second while the 42 MatMult() operations took 5.7 seconds. Suppressing the VecScale() operation would speed up the code by 20%, in the best case. The overhead due to the for loop is even lower than that. I guess that's the reason why this function does not exist.

I apologize for the poor performance of my computer (392Mflops for VecScale()...). I am curious to know what happens on yours !