Search code examples
pythonmachine-learningpytorchneural-network

What does the training data for Neural Radiance Fields represent?


I'm trying to code a Neural Radiance Field (NeRF) model from scratch. I'm following from this Colab notebook: https://colab.research.google.com/drive/1rO8xo0TemN67d4mTpakrKrLp03b9bgCX#scrollTo=us0x_gGPI4tq

In the notebook, they train the Neural Radiance Field model on the classic yellow bulldozer that can be downloaded from UC Berkeley's website:

https://people.eecs.berkeley.edu/~bmild/nerf/tiny_nerf_data.npz

The data consists of 106 100x100 RGB images (106, 100, 100, 3), the corresponding camera poses (106, 4, 4), and the camera focal length, which corresponds to the FOV that the images were taken. My problem is that I'm not exactly sure what the camera poses data represents (the one that has shape (106, 4, 4)).

In the Colab notebook, it is mentioned that this data is "A 6-DoF rigid-body transform (shape: :math:(4, 4)) that transforms a 3D point from the camera frame to the "world" frame for the current example". I know that rigid body transformations can be described as 4x4 matrices, given a point represented by homogeneous coordinates, but I'm not sure what that means in this context. Any explanation would be appreciated.


Solution

  • It means the corresponding camera pose (rotation and transformation) in world coordinate for every training picture.