I'm wondering if it's possible to calculate the speed ( on the Y-axis ) of a person only by having time in milliseconds since sequence start and different frames that contains the person moving in the scene . Like this for example :
Instead of using the depth data, you can activate skeleton tracking and track the position of the head joint: it will give you better results. The positions of joints in Kinect's skeleton tracking are given in meters, so simply calculating the distance from the head's position in the current frame and the head's position in the previous frame, and dividing by the time elapsed in seconds (remember to divide milliseconds by 1000) should give you the speed in meters per second.
If it's not possible to activate skeleton tracking, you can find the highest pixel on the image that belongs to the user (Kinect's depth data includes that information, or if you don't have depth data, use the highest non-black pixel in your images), and then map it to 3D world space using Kinect SDK's coordinate mapper. Then calculate speed as stated above.
This will give you the speed for each frame; to find the overall speed you can average the last 30 speeds or so (this will give you the user's average speed in the last second, which should variate more smoothly than a per-frame value).