I am searching for rowwise vector matrix operations( I think that should be BLAS level 2 routines). For example to substruct a vector from a matrix rowwise, or to normalize matrix by it rows sum. Are there such optimized standard routines?
Unfortunately, there is no such operation in BLAS. All the available subroutines are listed here: http://www.netlib.org/lapack/lug/node145.html
You can write your own subroutines and call BLAS level 1 for jobs like norm, axpy, etc... However, the gain in performance is generally modest.
-> BLAS is really important for matrix-matrix (or matrix-vector) products where cache management, data locality and access pattern make a really big difference (in perfs).