Is this method telling us what is actually in the video, or is it just playing around with effects until you see something you like?
Essentially the latter. There's only so much information in any given image. The best test of such techniques is to include different similar images of known origin and have them go through the same process. What this appears to be is an out-of-focus white light far in the distance.
So they should take something like this New York at Night 1995 VHS video:
Source: https://www.youtube.com/watch?v=ADAxxx-Ry-4
Then take a similar shaped light from there:
Then run it thought the same "enhancements" and they will get similar results. I don't know the steps they used, but Here's some examples:
Up-res 100x
Contrast and color "enhancements"
Find Edges:

Which gives a somewhat similar result to the "Polynomial Texture Mapping", essentially tracing contours to give an illusion of 3D.
You can also create a 3D model from the brightness depth map in Photoshop, and adjust the lighting, which is kind of what he's doing virtually, again with no benefit. Lighting light.
So keep in mind this is video frame grab of an office window in New York. but what comes out is a blob with some strange detail. The detail is really just random artifacts of the VHS scan lines, and the pixel grid. The "bulging" isn't really there, it's just the shape of the brightness.
The actual "PTM" technique require multiple source images of the same object lit from different directions.
The actual Technique:
http://www.hpl.hp.com/research/ptm/
Technique described in the video:
It's useless here, what he seems to be doing is taking the brightness levels of the "enchanted" image, converting that into a 3D object as a depth/height map (the brighter the pixel the higher it is, also called a bump map). Then he renders multiple images of that 3D object lit from different directions, then converts that to a PTM (which is pointless, as a PTM is a
compressed representation of a 3D model).
He's taking an overly processed 2D image, arbitrarily converting it to 3D, then converting it to a PTM version of that 3D model, then converting that PTM back to 2D. Basically applying a bunch of inapplicable transforms to get something that looks like something, but has no relationship to the actual object which is just some blurry light.