Google said: "The core idea of real-time motion tracking is to separate the camera's translation estimation and rotation estimation, and consider the two as independent optimization problems. First, we only determine the translation of the 3D camera based on the camera's visual signal. For this, We observe the apparent 2D translation of the target area and the relative proportions across frames. A simple pinhole camera model can correlate the translation and scaling of the boxes in the image plane with the final 3D translation of the camera.
The system can determine the 3D translation between the two camera positions (C1 and C2) by the translation and size (relative scale) of the blocks in the image plane. But since the camera model does not assume the focal length of the camera lens, we cannot know the true distance/depth of the tracking plane.
To solve this problem, Google added a scale estimate to the existing tracker (the tracker for Motion Text) and added area tracking outside the camera's field of view. When approaching the tracking surface, the virtual content can be accurately scaled, which is consistent with the perception of real world objects. When you pan outside the field of view of the target area, the virtual object will reappear in roughly the same position.
After that, the system will acquire the 3D rotation (pitch, roll and shake) of the device through the built-in gyroscope of the smartphone. Combining the estimated 3D translation with 3D rotation enables the system to render virtual content correctly in the viewfinder. Because the system handles rotation and platform separately, Google's instant motion tracking method does not require calibration and can support any Android device with a gyroscope.