As I've often said, I am no photographer: if I want to photograph something I just point, click, and hope for the best. Applying this sophisticated technique I held my finger about 2 inches away from the lens of my phone camera (Samsung Galaxy), while everything else in view was considerably further away. The result was that the background was in pretty good focus, while the finger was noticeably out of focus, but still easily recognisable as a finger. I can't really judge whether it was more or less out of focus than the object in the video, but I wouldn't call it 'way out of focus'.
In my case the camera lens was probably closer to my finger than the lens of the IR camera was to the 'protective dome' of the camera and any smudge on that dome. If so, the focus of the smudge would probably be better (or less bad) than the focus of my finger. Holding my finger about 8 inches from the lens, which seems consistent with the contraption shown at #47 above, the focus of the finger was about as good as I ever get of anything. Anyone can try the same experiment very quickly for themselves.
I tentatively conclude that the 'out of focus' argument is not conclusive against the 'smudge' theory. Other arguments may be more decisive. If the shape of the object does indeed change during the course of the video (see #38 above) that does seem decisive.