OpenGL ES on Android: How to rotate camera in response to touch events?

The Android developer guide on OpenGL ES describes how to apply a camera view transformation based on touch input (see OpenGL ES: Respond to touch events).

This is basically achieved by setting an angle property based on the x- and y-movement of the touch …

override fun onTouchEvent(e: MotionEvent): Boolean {
    /* … */
    renderer.angle += (dx + dy) * TOUCH_SCALE_FACTOR
    /* … */
}

… and applying the rotation according to it:

override fun onDrawFrame(gl: GL10) {
    /* … */
    Matrix.setRotateM(rotationMatrix, 0, angle, 0f, 0f, -1.0f)
    /* … */
}

In this approach, however, both the x- and y-coordinate of the touch event are combined to a single angle determining the rotation around the z-axis.

I tried separating the angles …

override fun onTouchEvent(e: MotionEvent): Boolean {
    /* … */
    renderer.angleX += dx * TOUCH_SCALE_FACTOR
    renderer.angleY += dy * TOUCH_SCALE_FACTOR
    /* … */
}

… to perform distinct rotations:

override fun onDrawFrame(gl: GL10) {
    /* … */
    Matrix.setRotateM(rotationMatrix, 0, angleX, 0f, 1f, 0f)
    Matrix.setRotateM(rotationMatrix, 0, angleY, 1f, 0f, 0f)
    /* … */
}

However, the result does not seem right when dragging along the x-dimension.

How do I get this very basic camera movement right that you see in basically every other 3D program across different platforms?

(In iOS SceneKit, you could get this (and other gestures) using a single line: sceneView.allowsCameraControl = true.)

Solution

The camera can be thought of as moving on the surface of a sphere around the point of origin. Therefore, spherical coordinates are well-suited for describing its motion.

To define the position of a point in space, the spherical coordinate system uses a combination of the distance r of the point to the origin (the radius of the sphere) and two angles theta and phi. For our use case, the radius will remain fixed. Touch movements in the x-direction will correspond to changes of phi and movements in the y-direction to changes of theta.

Interchanging the axes for the OpenGL ES-style coordinate system, the spherical coordinates (r, theta, phi) translate to regular Cartesian coordinates (x, y, z) in the following way:

x = r * sin(theta) * sin(phi)
y = r * cos(theta)
z = r * sin(theta) * cos(phi)

The position and orientation of the camera will be set solely using the Matrix.setLookAtM() method. In addition to the output matrix (and an offset inside it), this method takes three input vectors. The eye vector defines the position of the camera in three-dimensional space, and the center vector specifies the point the camera will be looking at (which, in our case, will always be the origin). This leaves one degree of freedom for the camera, namely its rotation around the line connecting the eye and the center. This rotation is determined by the up vector defining the up-direction in the field of view of the camera. Only the direction of this vector has any relevance, not its length.

In code, the onTouchEvent() method of the GLSurfaceView subclass is again used to capture touch input (the reversal of direction around the midpoint may or may not be removed):

override fun onTouchEvent(e: MotionEvent): Boolean {
    /* … */
    renderer.phi -= dx * 0.01f
    renderer.theta -= dy * 0.01f
    /* … */
}

The GLSurfaceView.Renderer subclass declares the two angles and a fixed radius, and it performs the camera transformation. Initially, the angles are set to theta = 90° and phi = 0°, corresponding to (x, y, z) = (0, 0, r):

class CubeRenderer : GLSurfaceView.Renderer {
    /* … */

    @Volatile
    var theta: Double = PI / 2.0

    @Volatile
    var phi: Double = 0.0

    override fun onDrawFrame(unused: GL10) {
        GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT)

        val x = RADIUS * sin(theta) * sin(phi)
        val y = RADIUS * cos(theta)
        val z = RADIUS * sin(theta) * cos(phi)

        val ux = -cos(theta) * sin(phi)
        val uy = sin(theta)
        val uz = -cos(theta) * cos(phi)

        Matrix.setLookAtM(
            viewMatrix, 0,
            x.toFloat(), y.toFloat(), z.toFloat(),   // eye
            0.0f, 0.0f, 0.0f,                        // center (= origin)
            ux.toFloat(), uy.toFloat(), uz.toFloat() // up
        )

        Matrix.multiplyMM(vPMatrix, 0, projectionMatrix, 0, viewMatrix, 0)
        triangle.draw(vPMatrix)
    }

    private companion object {
        const val RADIUS = 5.0
    }
}

The up vector is set to be orthogonal to the line connecting the eye and center vectors. Given the spherical coordinates (r, theta, phi) of the eye vector, this can be achieved by setting the up vector to (1, theta - pi/2, phi). That is, the theta angle of the up vector is 90° less than the one of the eye vector, and, since its length does not matter, it is set to 1 (to ease the computation).

Given that sin(x - pi/2) = -cos(x) and cos(x - pi/2) = sin(x) for any real number x, we thus obtain:

ux = sin(theta - pi/2) * sin(phi) = -cos(theta) * sin(phi)
uy = cos(theta - pi/2) = sin(theta)
uz = sin(theta - pi/2) * cos(phi) = -cos(theta) * cos(phi)

With the eye vector initially set to (r, theta, phi) = (5, pi/2, 0), the up vector will initially be (ux, uy, uz) = (0, 1, 0), that is, pointing toward the positive y-axis.