Getting 3D world coordinate from (x,y) pixel coordinates

I'm absolutely new to ROS/Gazebo world; this is probably a simple question, but I cannot find a good answer.

I have a simulated depth camera (Kinect) in a Gazebo scene. After some elaborations, I get a point of interest in the RGB image in pixel coordinates, and I want to retrieve its 3D coordinates in the world frame. I can't understand how to do that.

I have tried compensating the distortions, given by the CameraInfo msg. I have tried using PointCloud with pcl library, retrieving the point as cloud.at(x,y).

In both cases, the coordinates are not correct (I have put a small sphere in the coords given out by the program, so to check if it's correct or no).

Every help would be very appreciated. Thank you very much.

EDIT: Starting from the PointCloud, I try to find the coords of the points doing something like:

point = cloud.at(xInPixel, yInPixel);
point.x = point.x + cameraPos.x;
point.y = point.y + cameraPos.y;
point.z = point.z - cameraPos.z;

but the x,y,z coords I get as point.x seems not to be correct. The camera has a pitch angle of pi/2, so to points on the ground. I am clearly missing something.

Solution

I assume you've seen the gazebo examples for the kinect (brief, full). You can get, as topics, the raw image, raw depth, and calculated pointcloud (by setting them in the config):

<imageTopicName>/camera/color/image_raw</imageTopicName>
<cameraInfoTopicName>/camera/color/camera_info</cameraInfoTopicName>
<depthImageTopicName>/camera/depth/image_raw</depthImageTopicName>
<depthImageCameraInfoTopicName>/camera/dept/camera_info</depthImageCameraInfoTopicName>
<pointCloudTopicName>/camera/depth/points</pointCloudTopicName>

Unless you need to do your own things with the image_raw for rgb and depth frames (ex ML over rgb frame & find corresponding depth point via camera_infos), the pointcloud topic should be sufficient - it's the same as the pcl pointcloud in c++, if you include the right headers.

Edit (in response): There's a magical thing in ros called tf/tf2. Your pointcloud, if you look at the msg.header.frame_id, says something like "camera", indicating it's in the camera frame. tf, like the messaging system in ros, happens in the background, but it looks/listens for transformations from one frame of reference to another frame. You can then transform/query the data in another frame. For example, if the camera is mounted at a rotation to your robot, you can specify a static transformation in your launch file. It seems like you're trying to do the transformation yourself, but you can make tf do it for you; this allows you to easily figure out where points are in the world/map frame, vs in the robot/base_link frame, or in the actuator/camera/etc frame.

I would also look at these ros wiki questions which demo a few different ways to do this, depending on what you want: ex1, ex2, ex3