Efficient implementation of matrix multiplication ARM cortex A9 - Xilinx SDK

Is there any simple way-library to efficient (max possible speed) implement linear algebra on an ARM CortexA9 dual core using Xilinx SDK?

I am using a zybo z7 developememt board with a dual core Arm proccesor and i want to implement a simple neural network with one convolution layer followed by a dense one, on Xilinx SDK. Specificaly, to tranfer a python numpy based model on Arm. I read some manuals for ARM and SIMD library but i don't want to dive so deep.

An easy way for me is to use a library and do the multiplication/dot product/convolve etc by itself (fast) like numpy in python and avoid pure for...loop syntax. An example would be nice!

Thank for your time

Solution

You can try the Eigen library used by Tensorflow to implement the matrix calculations, or you can even try to use TensorFlow lite which is already tested with the ARM-Cortex M series of processors.

Why does an empty preprocessor command still evaluate to something?
How to implement variable sized array within C struct
Character array typecasting to integer
How can I exclude non-numeric keys? CS50 Caesar Pset2
How to get the sign, mantissa and exponent of a floating point number
Why do MCU libraries use logic operations instead of bitfield structs?
What kind of implementation can I use for a static associative array on a vintage system with very limited resources?
Determine libraries to link against for a windows library function?
Passing macro values to arm linker that places variable at a specific location
running a program with wildcards as arguments
How to perform addition of two vectors of 8-bit integers with a single addition in C/C++
GNU RISC-V Embedded GCC throws "x ISA extension `xw' must be set with the versions" error
Counting pulses using a swiss flow meter with an Arduino, how is it done?
How to create a folder in C (need to run on both Linux and Windows)
Is there any way to compute the width of an integer type at compile-time?
How can I initialize all members of an array to the same value?
Is C notably faster than C++
How to get the Windows SDK version number a program is compiling with at compile time
Confused by difference between expression inside if and expression outside if
Equivalent of atoi for unsigned integers
k&r: Exercise 1-18. Program takes input but doesnt produce any output?
Using in C thrd_sleep() to either wait for time or interrupt by signal. Example?
How can I compute `exp(x)/2` when `x` is large?
c programming: answer always equates to 0
Is it possible to access a parameter of a function from another function in C?
Will this expression evaluate to true or false (1 or 0) in C?
What Is the Return Value of strcspn() When Str1 Does not Contain Str2?
Mapping a numeric range onto another
Signalled and non-signalled state of event
Why is faster to do a branch than a lookup?