I have a parallel I/O project for parallel programming class, and I have to implement derived datatypes. I didn't clearly understand the difference between darray and subarray. Can darray be derived from dynamically allocated arrays or not? And what is the main difference?
Subarray lets you describe a single block/slice of a larger multidimensional array. If every MPI task has a single slice/block of a large global array, (or if you are communicating chunks of local arrays between tasks) then MPI_Type_create_subarray is the way to go; the syntax is very straightforward. For solving things like PDEs on regular meshes, this distribution is very common - each processor has it's own chunk of the global grid, with as many of its grid cells local as possible. In the case of MPI-IO, each MPI task would create a subarray corresponding to it's piece of the global array, and use that as it's view to read in / write out its part of the domain to the file containing all of the data.
MPI_Type_create_darray allows more complex distributed array patterns than single-chunk-each. For distributed linear algebra computations, it might make sense to distribute some matrices row-by-row -- say, if there's 5 mpi tasks, task 0 gets row 0, 5, 10... and task 1 gets row 1, 6, 11, and so on. Other matrices might get distributed by columns; or you could distribute them in blocks of rows, columns, or both. These data distributions are the same as were available in the ill-fated HPF, which let you define data-parallel layouts of arrays in this way, on an array-by-array basis.
The only way I've ever used MPI_Type_create_darray myself, and indeed the only way I've ever seen it used, is to create an MPI file view of a large matrix to distribute the data in a block-cyclic fashion, so that one can read the file in and then use scalapack to do parallel linear algebra operations on the distributed matrix.