I've conducted a detection via reconstruction error analysis on the frames of this video. the average reconstruction error was 0.4629192650318146, with anomalies detected in 628 frames. Further indicating this video may be CGI or manipulated.
View attachment 72258
Ug, it seems imaging has moved on not just technologically, but terminologically too, since my day.
I've had a quick read, let me now try and explain it to my grandmother, please correct me if I'm wrong.
But before that, given there are anomolies detected in 628 frames, and there are only 628 frames, it's not detecting anomalous frames, but frames within which there are anomolies - parts of the frame don't fit. That would be what we want were we to be looking for compositing, or localised touching up, say.
However, the above contains spurious
precision (16 s.f.), and no indication of
accuracy. That's bad science communication, which might be a sign of bad science. Is the average reconstruction error 0.463 with an S.D. of 0.05, or 0.463 with an S.D. of 0.2? Or 0.463 with an S.D of 0.01? The 0.19, 0.20, 0.18,... values would look unusual, predictable, or downright weird depending on which of the three cases it is. And as we're attempting to evaluate weirdness, that distinction should surely be *the most important one*.
But onto what's actually being measured, the "reconstruction error" ... let's hope grandma ready?
It seems that the picture is being approximated. Data points representing the pixel values (or some property derived therefrom) are being taken from a higher dimensional space (could be 3D) onto a lower dimensional space (could be 2D), such that the least information is lost. So if you imagine this somewhat amorphous solid blob of data points, it's being squashed flat in the direction where it's already the flattest, so the points are on aggregate moved the least. The "reconstruction error" seems to be a measure of how much the points were moved. Higher numbers mean that it was a fatter blob, lower numbers mean the blob was more sleak (quite appropriate for a flying saucer!).
So presumably this analysis has decided that, in general, each frame requires a lot of squashing to approximate the image, apart from some regions of it that only required a small amount of squashing, and thus it's being concluded that those regions must have come from a different source from the rest of the image?
If that is the case, then that's all well and good, but - in particular in a context where 16 significant figures have been used to specify an average - there's absolutely no numerical value at all given to the "in general" and the size of "some regions" in the above paragraph. Every image from a noisy source will have 2 anomalous pixels next to each other, 3 even, does that constitute such a region? If it does, every frame in every video is anomalous - so every video's been manipulated, and this tool is useless. So 3 pixels is out - but what sized region does count? We're simply not told. One thing that seems obvious is that the manipulated region size is different in every frame as the thing moves around. Therefore the confidence levels that the model should have output regarding whether an anomaly was found should have been varying frame to frame. But there's no confidence levels given at all. Why not? Again, this smacks of bamboozling with scary-looklng numbers rather than providing useful information, poor scientific communication.
It's like the ghost hunters who walk through the supposedly-haunted house who look at their EMF meters and exclaim "it's now reading 14.8!". Yeah, I see the number, and I agree it says that, but what is that *really* telling us? How do we distinguish between what we have in front of us with something ordinary?