Issues with Replicating the Palomar Transients Studies

@tuna is correct about the practical distinction here, and it's not just pedantry; it's a real implementation issue.
@John J. - You're right that conceptually, the null hypothesis is "no relationship." But to actually calculate p-values in practice, that's not specific enough. You need a model that specifies expected distributions.

Example: If H0 is "transients don't correlate with shadow," you still need to know - what distribution DO they follow? Uniform random? Following plate coverage? Each assumption gives different expected values and different p-values. That's what Tuna means by "not directly useful as a statistical null hypothesis" - it's conceptually correct but computationally insufficient.

This is the practical vs. philosophical distinction, and both matter.
 
Example: If H0 is "transients don't correlate with shadow," you still need to know - what distribution DO they follow?

You might be investigating/ find a given distribution, but that isn't anything to do with the null hypothesis.
If the experimental hypothesis is, "there is a relationship between frequency of transients and the Earth's shadow", the null hypothesis is simply
"there isn't a relationship between frequency of transients and the Earth's shadow".

You said,
To test this, you need a null hypothesis that specifies: What the transients are if NOT orbital glints, and what pattern that alternative would produce.
That is not correct; a proposed explanation for transients not being orbital glints would not be a null hypothesis; it would be a different experimental hypothesis, which should be testable. Null hypotheses don't explain why a relationship doesn't exist or provide alternative explanations- they just don't.
Null hypotheses don't explain or predict anything except that the relationship proposed in the experimental hypothesis won't be found.

This is the practical vs. philosophical distinction
I disagree :). The status and role of the null hypothesis might have its roots in philosopher Karl Popper's thoughts on falsification, but there isn't any requirement to attribute explanatory power, or alternative hypotheses, to the null hypothesis in real-world experimental/ observational studies, and to do so would be a mistake. (In fairness to Villarroel et al., AFAIK they haven't said that a null hypothesis should provide an alternative explanation).
I can't think of any relevant published papers where the null hypothesis specifies what an alternative explanation might be if the experimental hypothesis is not supported. (If the experimental hypothesis is not supported, authors might raise possible alternative hypotheses in a "discussion" or equivalent section; this doesn't affect the null hypothesis).
 
The logical null hypothesis about glints may be that glints are not causing the transients.

The statistical null hypothesis is more precisely something like, these two samples come from populations with equal variances and shapes of their respective distributions, depending on what test you use. You have to model the null distribution to do the test, and the validity of the test (whether the test supports the logical hypothesis) depends on your assumptions (or your theory) about the distributions. For example, they may have assumed defects should be statistically independent from Earth's shadow, etc. These assumptions depend on what the transients may be if not glints.

Regardless, it's in the interpretation stage where you make the connection between the statistical null hypothesis test, and the logical hypothesis or theory. For example, you may interpret the results as showing that a significant proportion of the transients are likely not defects. And claim that as evidence in support of the hypothesis that they are glints.
 
Last edited:
Good points!
I can't think of any relevant published papers where the null hypothesis specifies what an alternative explanation might be if the experimental hypothesis is not supported.
In addition, the null hypothesis may be false, and the as yet unsupported hypothesis may be true, especially if the effect was actually observed.

The fact that there's not enough evidence for something doesn't mean it's false.
 
For testing, we need a model for the null hypothesis, otherwise there's no way to compute the p-value.
P(H0) is high (e.g., >0.99) based on priors, because plate artifacts and other sources are common while pre-1950 orbital objects are not. P(H1) is low because there is no prior evidence for such populations.
 
@tuna and @beku-mant, your points are valid but (I feel that) in the context in which we're discussing null hypotheses, you're overlooking
you need a null hypothesis that specifies: What the transients are if NOT orbital glints...
-is simply wrong.
There is no requirement for the null hypothesis to provide an alternative alternative (experimental*) hypothesis (which is what specifying "what the transients are if not orbital glints" would be) and to state that there is is a misunderstanding of what the null hypothesis is.

It's not a huge deal, I make mistakes on Metabunk regularly, but it is a mistake.


* "Alternative hypothesis" is the widely-used term for the hypothesis proposed and tested by researchers, as opposed to the null hypothesis. I've used "experimental hypothesis" to mean the same thing, attempting to avoid phrases like "alternative alternative hypothesis", but I appreciate not all testable alternative hypotheses are experimental hypotheses in the literal sense.
 
@beku-mant, by your "disagree" are you indicating that you think a null hypothesis should include e.g. "What the transients are if not orbital glints"?

I'm hoping you're disagreeing with "I make mistakes on Metabunk regularly"? (Giving you an open goal there) :)
 
@beku-mant, by your "disagree" are you indicating that you think a null hypothesis should include e.g. "What the transients are if not orbital glints"?
They must make assumptions about the statistical properties they would have if not glints. Otherwise, it is not testable. You're conflating philosophical notions with scientific hypothesis testing.
 
So are you saying, the null hypothesis should include an explanation of what the photographic features are if they are not orbital glints?
 
They must make assumptions about the statistical properties they would have if not glints. Otherwise, it is not testable. You're conflating philosophical notions with scientific hypothesis testing.
Yes, but the assumptions don't need to be that specific.
If there is no way to segment the data set such that "defects" become distinguishable from glints, without having the shadow effect in both sets, then that proves the null hypothesis, too.

Villarroel 2025 does not point out which transients are glints and which aren't. (Solano 2022 removes many of the transients from consideration!) The paper doesn't even say how many of the transients are glints. It does not give a distribution for the glints. But it makes observations:
* data set of 106339 for area-based analysis
* number of transients in 42Mm shadow is 349
* number of transients in 80Mm shadow is 79 (table 4)
* data set of 107875 for plate-based analysis
* 349/107,875 transients in 42Mm shadow
* 76/107,875 transients in 80Mm shadow [does not explain why the number shrunk]
* using a 50 minute window for the shadow, 387 transients at 42Mm
* ditto, 80 transients at 80 Mm (both using 197875 data set)
* number of transients in 42Mm shadow if restricted to 2⁰ radius is 30% less than expected
* "overdensity" observed for low altitudes ("most of the field lies in shadow")

I have convinced myself that there is no distribution of glints that explains these observations with any sort of confidence, and that's why Villarroel does not give one.

It's really a distribution guessing game to see which distribution fits best, and Villarroel hasn't even entered.
 
* "overdensity" observed for low altitudes ("most of the field lies in shadow")
External Quote:
When the shadow covers a large portion of the plate (e.g., >50%), this assumption breaks down, and even a random distribution of artifacts will naturally yield an overdensity in the shadowed region. In such cases, the test becomes less sensitive to systematic avoidance, making small shadow coverages (e.g., <5%–10%) more reliable.
They did run the analysis for lower altitudes, but didn't put it in the paper!
 
* "Alternative hypothesis" is the widely-used term for the hypothesis proposed and tested by researchers, as opposed to the null hypothesis. I've used "experimental hypothesis" to mean the same thing, attempting to avoid phrases like "alternative alternative hypothesis", but I appreciate not all testable alternative hypotheses are experimental hypotheses in the literal sense.
"Alternative vs. Null" is Fisher's framing. Pearson-Neyman objected to Fisher's approach, and if you're flinging around p-values with that name, you're probably more aligned to the Pearson approach. Their rivalry was legendary, and it's easy to hypothesise that their use of "H0 vs. H1" (occasionally seen as "H1 vs. H2") was simply a deliberate snubbing of Fisher's terminology. However, you don't need to pick a side - just mix and match at will, everyone else does (emphasis mine):
External Quote:
Modern hypothesis testing is an inconsistent hybrid of the Fisher vs Neyman/Pearson formulation, methods and terminology developed in the early 20th century.
-- https://en.wikipedia.org/wiki/Statistical_hypothesis_test#Modern_origins_and_early_controversy

Being Bayesian, I'm sitting in my comfy chair with my fresh popcorn, as to me it's just an internal frequentist spat.
 
How does one determine altitude of a speck on a photographic plate when one cannot even be sure if it is a flaw on the plate? That is not meant facetiously, I don't see how it can be done (but don't think "I don't see how it can be done" automatically means it can't be done!)
 
How does one determine altitude of a speck on a photographic plate when one cannot even be sure if it is a flaw on the plate? That is not meant facetiously, I don't see how it can be done (but don't think "I don't see how it can be done" automatically means it can't be done!)
It doesn't work individually, but it works stochastically.

Imagine Earth's shadow like a hole in the sky. Train your analytic lens on that hole, make it really small, and you'll see no satellite flares. Now slowly open your analytic lens, and at a specific diameter d0 you'll start seeing flares, more and more as you keep opening. d0 corresponds to the apparent size of Earth's shadow at a specific altitude, and that gets smaller with respect to the cosmic background as it gets further away, due to perspective.

If there are enough samples, the curve of flares vs diameter will also reveal lower orbitals.

The problem is that there's no way to narrow this data set down to "just the flares", and anyway Villarroel didn't do this curve, probably because it raises questions she can't answer.
 
The problem is that there's no way to narrow this data set down to "just the flares", and anyway Villarroel didn't do this curve, probably because it raises questions she can't answer.
Thanks. So if I can summarize/condense, it could be done by the method you outline, but wasn't?
 
Thanks. So if I can summarize/condense, it could be done by the method you outline, but wasn't?
It's what I'd do, and there are signs that Villarroel did do it, but did not publish it in the paper—she remarks on keeping the shadow area "<5%-10%" to stay clear of the "overdensity" at ">50%" (which is very much not "natural"), and how else would she know this?

But if she did do it, and if it supported her hypothesis, surely she would have included it in the paper?

Note that the outcome depends on the sample size and quality. The 5399 sample may be too small, and too contaminated, to produce a meaningful result.
 
It's what I'd do, and there are signs that Villarroel did do it, but did not publish it in the paper—she remarks on keeping the shadow area "<5%-10%" to stay clear of the "overdensity" at ">50%" (which is very much not "natural"), and how else would she know this?
This is a classic example of a "researcher's degree of freedom".
 
Note that the outcome depends on the sample size and quality. The 5399 sample may be too small, and too contaminated, to produce a meaningful result.

Which I think is the fundamental question. Was the downselect from 100K+ to these 5399 based on factors that can be replicated by someone else starting from the same 100K+, or are they simply a random select performed to reduce the number of UFO's "discovered" down to a more believeable number.

Which requires knowing where the 100K+ are on what plates and letting someone else perform the "is it a defect" analysis on each of them.

I think the focus on identifying some statistical test that would 'prove' they are UFO's is misguided, until the original is-it-or-isn't-it has been shown to have some validity.
 
As I understand it, the 5399 remnants are just the few star-like anomalies which cannot be associated with known stars, even if we assume a moderate amount of proper motion. If they are all caused by emulsion flaws, they are irrelevant to the question of NHI satellites. Even Villaroel can't rule out the possibility that they are all emulsion flaws, and that should remain the null hypothesis.
 
The idea that 'proper motion' can explain the bulk of the 100K anomalies is itself questionable. Only the closest 8000 stars in our catalogs are within 100 light years, and beyond 100 light years proper motion becomes increasingly negligible. The idea that 100K stars have all moved up to 5 arcseconds is another factor that can be tested.
 
Was the downselect from 100K+ to these 5399 based on factors that can be replicated by someone else starting from the same 100K+, or are they simply a random select performed to reduce the number of UFO's "discovered" down to a more believeable number.
Read Solano 2022, it's a good paper that makes a lot of sense.


The idea that 100K stars have all moved up to 5 arcseconds is another factor that can be tested.
Solano 2022 uses several star catalogs to pare down its 298165 sources. Villarroel only uses neoWISE. These 100k stars haven't moved, they're just not in the 3 catalogs that the Villarroel 2025 papers have applied.
 
Solano 2022 uses several star catalogs to pare down its 298165 sources. Villarroel only uses neoWISE. These 100k stars haven't moved, they're just not in the 3 catalogs that the Villarroel 2025 papers have applied.
So you're claiming with certainty that out of the ~300,000 transient candidates, the vast majority are actually stars? Even those that accumulated near the edges of the plates?

What do you suppose caused them to flip from detectable to undetectable to detectable, and is it plausible that we observe that happen >100,000 times in this time frame?

Hambly & Blair used approximately the same search for cross-matches as a way to define them as likely defects right? So they think the vast majority are defects? Why are they so confident that the lack of a match within 5 arcseconds in only Gaia stellar or PS1 is a solid indicator its not a star?
 
Last edited:
So you're claiming with certainty that out of the ~300,000 transient candidates, the vast majority are actually stars? Even those that accumulated near the edges of the plates?
no I dont claim that
Solano 2022 claimed it
read it
What do you suppose caused them to flip from detectable to undetectable to detectable, and is it plausible that we observe that happen >100,000 times in this time frame?
why do you suppose they do new star catalogs, if the old ones already have all the stars in them

there is no "flip"
please stop with the flipping

Hambly & Blair used approximately the same search for cross-matches as a way to define them as likely defects right?
no, they didn't. please quote them so we can see what you are referring to.
 
if it was a light source, it was not strong enough to show up in images produced by several much more sensitive instruments at different times

Many stars vary significantly in magnitude over time, and many of the transients were (I think) at the threshold of detection (mag. 21-22).

I fear that with a large number of biased non-experts trying to interpret nuanced, uncertain, scientific results, there is a major risk of interpretation bias.
Considering what interpretation bias is (it's not anything to do with type 2 error, and generally applies to social situations) it's perhaps unlikely to be a major risk in this context.

Whether Metabunk members who hold different views to your own are biased might be arguable; I think many posters here try to find mundane explanations for extraordinary claims, which might be a bias of sorts. That could be seen as a bias toward favouring likely explanations over less likely ones, which isn't necessarily a bad thing. It doesn't mean our conclusions are always correct, of course.
 
Last edited:
Whether Metabunk members who hold different views to your own are biased might be arguable; I think many posters here try to find mundane explanations for extraordinary claims, which might be a bias of sorts. That could be seen as a bias toward favouring likely explanations over less likely ones, which isn't necessarily a bad thing. It doesn't mean our conclusions are always correct, of course.
One person selectively interprets something in a way that happens to fit their bias and the bias of the in group of the community. Some people reinforce that interpretation with likes, spin, and rationalization. This is textbook group-think.

Misinformation is spreading from here into the wild because of this, and the least that can be done is to stop it after the mistake is pointed out, if not also acknowledge the mistakes.
 
Fair point, I accept that my understanding of the term "interpretation bias" was too narrow and so my criticism of your use of that phrase was wrong.
 
They imaged the same part of the sky, and in the 50's, if was a light source, it was strong enough to show up in the POSS-I images. Then they imaged the same part of the sky again, and if it was a light source, it was not strong enough to show up in images produced by several much more sensitive instruments at different times, yet at other times it was.
Which is the case with weak light sources.
You say they are all stars.
I have never said that, because I have read Solano 2022.
I'm just doubting your interpretation, especially since it massively contradicts Hambly's.
They use a different approach.
Their ML classifiers are inexact in a different way from Solano's approach of removing sources within 5 arcseconds from known objects from consideration. Their numbers strongly suggest that their "true bad" data set contains stars and galaxies, because the star and galaxy detectors detect them in there (23%-27%).
Note also that they did a "magnitude cut" at 19.5, excluding objects from consideration that the POSSI-E may have captured.
I'd probably agree that the stronger light sources should show up in every catalog.
 
I fear that with a large number of biased non-experts trying to interpret nuanced, uncertain, scientific results, there is a major risk of interpretation bias.

https://en.wikipedia.org/wiki/Interpretive_bias
"Interpretive bias" is not a well-established concept. (I'd say it applies to Villarroel more than anything else.)

You could even say that the null hypothesis represents confirmation bias for the status quo that the data needs to overcome.

And lastly, Villarroel presents her result as certain, see the abstract of the transient alignment paper quoting 3.9 sigma, 7.6 sigma, and even 22 sigma significance.

I would agree with you that these results are in fact rather uncertain, but then I really don't understand what your position is.

As in the letters you linked, if you claim interpretive bias, you ought to be able to show where what we say contradicts the data. What I'm finding is that you keep misquoting me, which means that you should probably examine how your own bias induces you to interpret me wrongly.
 
What do you suppose caused them to flip from detectable to undetectable to detectable, and is it plausible that we observe that happen >100,000 times in this time frame?
The issue you have here is independent of the nature of the transient data.
Article:
Objects not present either in Gaia EDR3 or Pan-STARRS but present in other astronomical surveys. To check this possibility, we looked for counterparts in the catalogues available from the CDS Upload X-match utility implemented in TOPCAT (Taylor 2005) and from IRSA (Neowise, PTF). These searches were complemented with a search in the catalogues available from VOSA (Bayo et al. 2008). Sources with counterparts at less than 5 arcsec in any of the queried photometric catalogues were removed from our list of candidates to vanishing objects.

These searches significantly reduced the number of candidates (from 298 165 to 9 395).

Please accept the idea that some established astronomical surveys have objects in them that other surveys do not. Villarroel and Solano certainly do.
 
It seems almost certain that a fraction of those candidates were removed by this process in error, and were in fact emulsion flaws. The size of that fraction is unknown.
 
It seems almost certain that a fraction of those candidates were removed by this process in error, and were in fact emulsion flaws. The size of that fraction is unknown.
Oh, definitely.
The way I understand the 5 arcsecond to be applied is that any astronomical entry in the catalog can cause any number of transients to be removed from the data set. Ideally, analysis would establish a 1:1 correspondence and identify what is on the plates.

It's justified because it means that the remaining transients are almost certainly not astronomical objects. Solano 2022 is the attempt to remove all data points that could be astronomical objects, and then see what remains. Since objects in orbit should also appear where there are no astronomical objects in the background, they'd still be able to find some.
 
Some of these dim 'disappearing' stars could be brown dwarfs at various distances out to a couple of hundred light years distance. These objects might have fairly respectable proper motions, even in excess of the 80 milliarcseconds/yr cut-off that Solano suggests. There are so many arbitrary constraints on this data set that it seems almost meaningless.
 
Last edited:
One person selectively interprets something in a way that happens to fit their bias and the bias of the in group of the community. Some people reinforce that interpretation with likes, spin, and rationalization. This is textbook group-think.

Nope, groupthink puts desire to conform to the fore. We're not doing that. We of course do have extremely similar priors, but we're also perfectly happy to stick our heads out with any insights that we might think are smarter than others'.
 
Oh, definitely.
The way I understand the 5 arcsecond to be applied is that any astronomical entry in the catalog can cause any number of transients to be removed from the data set. Ideally, analysis would establish a 1:1 correspondence and identify what is on the plates.

It's justified because it means that the remaining transients are almost certainly not astronomical objects. Solano 2022 is the attempt to remove all data points that could be astronomical objects, and then see what remains. Since objects in orbit should also appear where there are no astronomical objects in the background, they'd still be able to find some.
You're saying most of them are stars, Solano 2022 is saying most of them could be stars, and Hambly is saying practically none of them are stars. This is a wide range of interpretations. As a non-expert I can only say that there seems to be a lot of uncertainty.
 
Last edited:
no, they didn't. please quote them so we can see what you are referring to.

Here is the quote you requested. It is a quote from N. C. Hambly and A. Blair 2024, On the nature of apparent transient sources on the National Geographic Society–Palomar Observatory Sky Survey glass copy plates. Section 2.34. (page 3).
The quote adds important context about the possible apriori interpretations of what the transient candidates might be, and how a lack of cross matches in certain catalogues may inform that interpretation.
We defined a plate catalogue entry as likely spurious if there was no associated Gaia stellar or PS1 galaxy entry of any kind within 5 arcsec of its measured position. In this way we ensured that if there was any chance of a plate catalogue entry being real it would not be included in the spurious detection training data.

https://arxiv.org/pdf/2402.00497
 
Nope, groupthink puts desire to conform to the fore. We're not doing that. We of course do have extremely similar priors, but we're also perfectly happy to stick our heads out with any insights that we might think are smarter than others'.
So stick your head out, is interpretative bias an established concept and relevant to science? I tried backing up my claim that it is several times, but the moderator keeps deleting it. Links about interpretive bias seem to be impossible to post.

Maybe you know the trick since you succeeded in posting a link about statistical hypothesis testing with no problem.

https://www.metabunk.org/threads/is...-palomar-transients-studies.14534/post-356369
 
So stick your head out, is interpretative bias an established concept and relevant to science? I tried backing up my claim that it is several times, but the moderator keeps deleting it. Links about interpretive bias seem to be impossible to post.

Maybe you know the trick since you succeeded in posting a link about statistical hypothesis testing with no problem.

https://www.metabunk.org/threads/is...-palomar-transients-studies.14534/post-356369
The Posting Guidelines are clear. Follow them.
 
I tried backing up my claim that it is several times, but the moderator keeps deleting it. Links about interpretive bias seem to be impossible to post.

I saw one of the deleted posts, I think (I can't see it to check) you didn't describe what was in the link. Many of us have done the same.
It isn't anything specific to interpretation bias or because of any viewpoint or argument that you might be advancing/ questioning.

The Link Policy on this forum states
Links

The reader should not have to click on a link in order to understand what the post is about. When you link to something to back up something you are discussing then:
  • Describe what is in the link, and why it is relevant to the thread topic.
...so we shouldn't post something like "Here is my evidence [posts link]", but again many of us have done that.

Edited to add: Good grief @Mendel, which bit do you disagree with?
 
Last edited:
I saw one of the deleted posts, I think (I can't see it to check) you didn't describe what was in the link. Many of us have done the same.
It isn't anything specific to interpretation bias or because of any viewpoint or argument that you might be advancing/ questioning.

The Link Policy on this forum states

...so we shouldn't post something like "Here is my evidence [posts link]", but again many of us have done that.
Fair enough, but in my second attempt I described what was in the link, why I posted it, and included an image from the link in the post, which I said illustrated the concept. I will accept that I perhaps didn't follow the rules perfectly however.

In this case, at first, the existence of multiple research articles about interpretive bias in science, itself was the evidence to back up the claim that it is an established concept relevant to science. Perhaps this edge case exposes a bug in link policy?

Anyways, in this case I find it unreasonably difficult to back up the claim here. But it is not difficult to research interpretive bias in science on your own. The fact we are even arguing about it is silly to me in the first place.

If everyone here is confident they aren't at risk of interpretive bias, fine. I've already warned of the risk of interpretive bias before, concerning this very issue about what the cross matching implies. But still for weeks, the interpretation that almost all of the candidates are actually stars has been promoted as something that discredits the 2025 papers, and that has spread and caught traction elsewhere too. It's tempting to interpret the data this way, if you are looking for a reason to claim their work is junk, but it's actually not that simple.
 
Back
Top