The statistical case is different. When testing whether a pattern is significant, how many tests you run affects the inference itself. Villarroel reports shadow deficit with p < 0.05, meaning "only 5% chance this is random noise." But if you test 20 different null hypotheses, you'd expect one to hit p < 0.05 just by luck. Finding one that produces a similar pattern doesn't tell you her finding is wrong - it tells you that among many things you tried, one happened to work.
That's not how this works.
In statistics, you see an effect when you notice something unexpected: Here's this group of 21000 unvaccinated people, and 9 got Covid over 2 months; but in this other group of 21000 people, which we vaccinated, only 1 person did, which is less than we expected based on what happened to the control group. Hence, the vaccine has an effect. Without the comparison, there is no effect. But you have to compare the right things, which is why medical trials like this one are placebo-controlled and blinded (and, yes, ideally pre-registered).
Hypothesis: Vaccinated people are protected against Covid.
Null hypothesis: unvaccinated people are protected against Covid.
The vaccine trial proved the
hypothesis likely true, and the
null hypothesis likely false.
For the UFO report, we have a person see a light in the sky, and
incredulously think: "that shouldn't be there". They compare their observation with what they think they should see, which is not a bright light.
Hypothesis: there's an UFO emitting light in the sky here
Null hypothesis: there should be something in the sky here
The problem here is that not only has the observer failed to produce any evidence for what is producing the light (all he saw was the light), he merely believed the null hypothesis was false, but didn't check properly.
@flarkey proved the null hypothesis true, and that makes the effect vanish.
That's because we don't challenge the hypothesis. We're not in a parking lot in Long Island at midnight looking for lights in the sky, replicating the evidence for the hypothesis. We believe the officer saw a light in the sky. We simply don't believe the null hypothesis is false, because we're sceptical; and we have methods we apply to check it, including e.g. examining historical flight tracking data and a 3D simulation of the observer's view which includes the flight: it looks exactly like his video. The null hypothesis is likely true.
In Villaroel 2025a, we're dealing with two hypotheses:
External Quote:
We use images from the First Palomar Sky Survey to search for multiple (within a plate exposure) transients that, in addition to being point-like, are aligned along a narrow band. [..] These aligned transients remain difficult to explain with known phenomena, [..] We also find a highly significant (∼22σ) deficit of POSS-I transients within Earth's shadow when compared with the theoretical hemispheric shadow coverage at 42,164 km altitude. The deficit is still present though at reduced significance (∼7.6σ) when a more realistic plate-based coverage is considered.
Alignment effect:
Hypothesis: the transients are aligned because they are on a path
Null hypothesis: the transients are aligned randomly
Villarroel states she cannot reject the null hypothesis for most of her 3-point samples. She "doesn't show her work", i.e. she's citing older work about Quasars and claims it applies here, to reject the null hypothesis for the 4-point and 5-point alignments. The problem I see is that the data is not uniformly random, and that changes the statistical analysis. The grid pattern we're seeing in the data makes alignments more likely than a uniform random distribution. But it's not caused by orbital paths, and seems to correlate with the plate edges, i.e. the cause is on Earth and not in the sky.
This means Villarroel rejected the wrong null hypothesis.
That's similar in principle to saying "this darts player is better than average because his darts are significantly closer to the bullseye than a random uniform covering of the dart board" when instead you should've compared with random dart players.
The shadow alignment has the same problem:
Hypothesis: there are fewer transients in the 40Mm shadow because transients are caused by the sun
Null hypothesis: the deficit is caused by a uniform random coverage of the sky
Villaroel finds 0.32% of transients in the shadow, against 1.15% expected.
The problem: the null hypothesis doesn't reflect the actual distribution of plate defects.
New null hypothesis: the deficit is caused by a uniform random coverage of the plates.
Villarroel finds 0.32% of transients in the shadow, versus 0.53% expected. This means that the actual plate coverage pattern deviates from a uniform coverage by more than a factor of 2. Yet table 4 shows only the values for the uniform coverage, that she herself proved false. Plus, with the 80Mm altitude, she finds even less of a deficit! No explanation.
Then she does a great thing:
External Quote:
As a quick check, nevertheless, we also test by masking edge transients (>2° from plate center) to remove all artifacts close to the plate edge. Removing the edge of the plate in the analysis, yields a similar ∼30% deficit in Earth's shadow,
This is motivated by the grid pattern in the data that we've also found (after Hambly & Blair pointed it out), see e.g. https://www.metabunk.org/threads/digitized-sky-survey-poss-1.14385/post-355943 . And it proves that the shadow effect is bogus.
Note that the paper is skimpy on numbers here. The 2⁰ cut removes about half of the plate area (4⁰ vs. 6⁰ diameter, aka 16:36), but many more plate defects, since the grid pattern is caused by plate defects. So if there are orbital objects, the data should still have about 50% of these, but maybe only 10% of the plate defects. (I don't really know the number, but it's substantially less than 50%.) But the shadow deficit shrunk from 39% to 30%! It should have done the opposite!
We now know that the edge half of the plates has a stronger shadow deficit than the center half. This falsifies Villarroel's finding.
Hypothetical example, with numbers instead of algebra:
Assume 106,339 data points. 349 points are found in Earth's shadow, but we expected 564 points. We have a deficit of 215 points. We can then split our sample in 65850 sources that are uniformly randomly distributed, and 40489 sources that are uniform everywhere but never in shadow.
What happens when we cut the edges off? If the edges are mostly plate artifacts, which are shadow agnostic, we may be left with, say, 10% of the 65850=6585. The shadow avoiders are supposed to not favor edges, so we're keeping 16/36 of 40489=17995, we have a total of 24580 transients, with 35 in the shadow. 35/24580=0.14% versus 0.53%, that's a 73% deficit. The hypothetical deficit has doubled! Yet the real deficit did not.
Instead, we saw a 30% deficit. We'd expect 0.53% of the 24580 samples (130) to be in shadow if there was no shadow effect. With a 30% deficit, we actually found 91. That means our sample is now split in 17170 uniformly distributed sources and 7410 shadow avoiders. The edge cut eliminated 48680 of 65850 uniformly distributed sources, and 33079 of 40489 shadow avoiders.
This means we have 33078 shadow avoiders in the edges, and 7410 shadow avoiders in the centers of the plates, although these areas are roughly equal. And that should not happen. (The only way to get this to work out is to assume that there are more defects near the centers of the plates, and that's clearly not the case.)
This is only possible if the data that Villarroel expects to be uniformly distributed is in fact not uniformly distributed. Her null hypothesis is wrong.
If we can replicate her data, we can hopefully show why that is.
And we won't be doing it by trying "random patterns". As with flight 1168, we'll try to show exactly what data causes the effect.