I am trying to access last and next indices coordinates inside the kernel.
ex: int idx = blockIdx.x * blockDim.x + threadIdx.x;
then pos[idx].x, pos[idx].y, pos[idx].z would give current coordinates of a point. but cannot access other two. I am trying to calculate the normals of the changing triangle in GPU level using CUDA.
How easily normals can be computed on the GPU depends on the mesh topology.
It is easy to compute normals for a mesh with triangle-list topology: Use one GPU thread per triangle. This results in highly regular reads and writes and will work for any valid configuration of blocks and threads in CUDA. Unfortunately, triangle-list topology isn't very useful (for starters, it will be flat-shaded unless some additional processing is employed).
It is [much] harder to compute normals for a mesh with triangle-strip topology (which is commonly used). The problem is that vertices are used in multiple triangles and therefore you must accumulate a [weighted] average for each vertex-normal by combining multiple triangle-normals. Using one GPU thread per triangle means that multiple vert-norms will be affected from multiple GPU threads "simultaneously". Alternatively, using one GPU thread per vertex means that a list of triangles that reference that vertex are needed, then the triangles need to be read (pairs of additional verts) so that the vert-norm can be computed... which is difficult, but not impossible.
Finally, if your model uses indexed vertices, this will impose an additional [semi-random] look-up which may cause problems. This problem can be addressed with spatial partitioning.