For the plane to "pop out" so much more than anything else in the video, the difference in distance between plane/camera and middle of cloud layer/camera would need to be an order of magnitude higher than the height of the whole cloud layer, from cumulus to cirrus. It doesn't seems realistic to me.
For the sake of the argument I used the disparity map to deform the video along the z axis. It's hard to represent in a 2D image, but as I expected if you look around in 3D space it doesn't look like a 3D scene at all for any frame of the video.
This wouldn't explain the inconsistency in the disparity of the aircraft though. For the first 40 seconds the disparity is larger, then it's very small until the flash. To explain this, we would have to assume that the animator accidentally keyframed a shift to the plane/clouds overlay around 40 seconds in. But this would be inconsistent with the fact that the clouds tend to have similar disparity throughout the video.
I may have an explanation for this.
I was able to get an estimated a 3D trajectory for the plane and the movement seems to be mostly in a plane inclined 45° relative to the camera.
If you make the hypothesis that we're actually seing the underside of the plane, the first 35s the plane is go away from the camera, then it turns and stays at the same distance. The difference in distance between the closest and farthest point seems to be roughly 2km.
The clouds are a 2D layer used as a background in the 3D scene, the plane is a 3D asset with a trajectory that goes away from the camera. The mouse cursor and text are also a 2D layer in the 3D scene, but closer to the camera.
You render this for the "left eye view", then you distort slightly the 2D layer to fake depth data, move the camera a bit, and render the "right eye view".
This give a pair of video where :
- there's no real depth to the clouds, but the illusion of it at the first glance.
- there is depth data to the plane, with a stronger disparity for the first half than later
- no cloud movement except for a few small distortions
- no parallax nor traces of it being corrected
For me this explains every weird facts about this video.
Reconstruction of the 3D trajectory :
I matched the 2D position and 3D rotation of a 3D model of the plane to a stitched version of the left eye video for 18 keyframes. Then
I used the 3D orientation of the plane to estimate the changes in the third dimension between keyframes by using the fact that a plane moves mostly where it's pointing.
It's not very precise, but it's good enough to get an idea of the 3D trajectory.
There are actually 2 different solutions, depending on whether we are seing the topside or the underside of the plane. Those two solutions are symmetrical.