If you have any questions about
my post #380, please ask.
I posed the same question many posts prior. You're right this is one of the most salient questions to ask. Although the answer might be more nuanced than you think. For example, maybe there is a risk of false cross matches. Maybe cross-matches with Gaia, Pan-STARRS and NeoWise are the most reliable, while the remaining cross matches remain in doubt. Most that were ruled out, were ruled out based on cross matches, apparently with none of these 3, but some other catalogues in the infrared. I guess those would be sources that were in the optical spectrum in the 50s and now are only in the infrared (sometimes) if they are real matches.
It's probably a bit more complicated, since you'd have to understand the uncertainty in the cross matching. Why would these sources have cross matches in some other catelog in the infrared, but not in Gaia, Pan-STARRS or NeoWise (which is in the infrared itself)? Does it make sense for there to have been a few hundred thousand optical to infrared only transitions occurring in that time frame?
And the 2022 paper states.
The adopted radius is a good compromise to ensure that not many high proper motion were left out while, at the same time, avoiding an unmanageable number of false positives.
https://academic.oup.com/mnras/article/515/1/1380/6607509?login=false
Meaning some of those filtered out were probably not real cross matches.
A lot of time science is messy and some things remain uncertain and mysterious.
In order to get enough statistical power you have to have more than n=5399, given the relatively small proportion of the sky at GEO in Earth's shadow. If there is a non-random distribution of true positive cross matches to astronomical bodies effect causing the deficit, then maybe that can be tested with evidence. How likely is it? I don't know. But you've got to go further than just asserting it, you have to test your hypothesis.
How could it be that my test with only the 5399 could only show a small low significance deficit while the ~100,000 found a high significance deficit (assuming I didn't make a mistake)? I don't know exactly. Asked ChatGPT this question several months ago and this is all I got, but of course I don't trust ChatGPT enough to consider it settled.
Me: This was my understanding, but I am trying to understand why the 5,399 transients would have a much smaller percent difference than expected than the larger dataset, when the larger dataset contains mostly objects that are likely (not sure how likely) normal astronomical objects given they have crossmatches (unverified) in other catalogues. Unless, most of them are actually transients. But then this calls into question their previous crossmatching algorithm? It just seems surprising.
ChatGPT: Ah, now I see exactly what you're puzzled about — this is a really subtle point about
expectations vs. actual observed shadow deficits, and it hinges on the
composition of the dataset and the
nature of the events being tested. Let's unpack carefully.
The Earth shadow test works like this:
- Assume a fraction of the candidate events are GEO glints.
- GEO glints cannot appear in the Earth's shadow (geosynchronous satellites are in sunlight except during eclipses).
- If you count events in shadow vs. outside shadow, you expect a deficit of glints in shadow.
So the size of the observed deficit depends on:
- Number of true GEO-like events in your sample.
- Total number of events (the larger the denominator, the smaller a small subset effect looks).
- These are very clean: no matches anywhere.
- The expectation might be: if some fraction are GEO glints, their deficit in shadow should show up clearly.
But here's the catch:
- 5,399 is tiny relative to the full POSS-I catalog.
- Only a small fraction of these 5,399 are likely actual glints; the rest could be artifacts or rare astrophysical events.
- With such a small number, Poisson noise dominates, and even if a deficit exists, the percent difference could be modest just by statistics.
- This includes many objects with crossmatches in other catalogs but not Gaia/Pan-STARRS.
- Many of these might be ordinary stars, but because the filter excludes the very complete optical surveys, some could be true unexplained events.
- Even if a smaller fraction of these 106k are actual glints, the large number of events inflates the absolute deficit — and so the percent difference appears larger and statistically stronger.
Think of it like this:
If you have 10 glints out of 100 events, the deficit in shadow might be 2 events → 2%.
If you have 50 glints out of 1,000 events, the deficit is 10 → 1%, but it's more statistically significant.
Because Villarroel uses a broader dataset, the
absolute number of glints is larger, so their deficit appears more dramatic, even if the fraction is smaller.
Yes, it seems counterintuitive:
- The 106k set includes sources with matches in some catalogs (IR, etc.), so some might be ordinary stars.
- But the filtering for no Gaia/Pan-STARRS counterpart ensures many are still "optically unexplained" — i.e., the crossmatching algorithm is not invalid; it's just less strict than complete orphan filtering.
- In other words, the 106k set probably contains a mix of ordinary objects and real transients. That mix is enough to show a significant deficit statistically.
This is consistent with:
- The 5,399 being very clean but tiny → weak signal.
- The 106k being broader → stronger, more statistically robust signal.
It only seems surprising if you assume:
- All crossmatched objects in other catalogs are "ordinary" → then the deficit should be washed out.
But Villarroel's key assumption:
- Objects with no Gaia/Pan-STARRS counterpart might still include unexplained transients, even if they appear in some other catalogs.
- So the actual fraction of "GEO-like glints" is higher than you might naively expect.
Essentially, the 106k dataset is
less pure but larger, which paradoxically makes the shadow signal stronger.
Summary intuition
| Dataset | Purity of unexplained events | Size | Shadow deficit visibility |
|---|
| 5,399 orphans | Very high | Small | Weak percent difference, noisy |
| 106,339 filtered | Medium | Large | Stronger percent difference, statistically robust |
The counterintuitive part:
larger, less pure samples can give stronger statistical signals than tiny, extremely clean samples.
If you want, I can do a
quick back-of-the-envelope calculation showing how the expected shadow deficit scales with dataset size and fraction of real glints, which will make this much more quantitative and intuitive. This often clears up why the 5,399 would have a smaller effect.
Do you want me to do that?
...
And then in follow up, ChatGPT claimed only a pretty small percentage of the ~100,000 cross-matches need be false positives and actually glints to explain the results. Again, I don't trust ChatGPT. But these are the questions you need to address.
What percent of the ruled out cross matches need to be false matches and actual glints to explain the results? How many of the transients need to be glints in total to expect these results? What is the expected false-positive rate in the cross matching, in the 2022 paper resulting only from the tests besides checking those 3 catalogues?