Digitized Sky Survey POSS-1

If you plot the larger list of transients from their webpage (171,753), you can see just how patchy the catalog is. As you said, you can clearly see the plate edges.
That's a great observation!

You are referring to the list of neoWISE matches from http://svocats.cab.inta-csic.es/vanish/ . "As you said" refers to @Eburacum 's message at https://www.metabunk.org/threads/digitized-sky-survey-poss-1.14385/post-355511 .
Also as an aside you can see they have incorrectly plotted the survey in normal coordinates, and not Galactic.
I'm not sure I fully understand. If these were galactic coordinates, then the Milky Way would align with the equator, like they show in figure 9 of Solano 2022? I don't think they claim that this data set is in galactic coordinates?
In fact, the coverage maps show it's not.
 
Might this be (at least in part) due to overlapping plates?
That should only give a 2:1 density difference. @ThickTarget's plot shows way more than that.

I wanted to show you a plot from https://adsabs.harvard.edu/full/1956AJ.....61..399G that shows how much displacement there is near the plate edges, but the server is currently down. [see below Any analysis algorithm cannot be edge-agnostic.

That's why it's important to look at the distributions of all the data sets, of everything that gets kept or thrown away, plotted as heat maps.

Since this neoWISE set is more than half of the 298165 candidates, and they're filtered against a grid-agnostic data set, the whole candidate set must look like this. I bet the replication effort will show exactly that.

(Obviously, when you have data points arranged loosely on a grid, it's much easier to find a group of points that form a line.)
 
Last edited:
Since this neoWISE set is more than half of the 298165 candidates, and they're filtered against a grid-agnostic data set, the whole candidate set must look like this. I bet the replication effort will show exactly th
To elaborate on that:
1) The 171753 data set is the intersection of the neoWISE IR star data set and the 298165 transient candidate set.
2) Since the result shows a grid structure, this grid structure must have been present in one of the two source data sets.
3) There is no grid structure in the neoWISE data set.
4) Therefore, the grid structure must already be present in the 298165 transient data set.
 
Might this be (at least in part) due to overlapping plates?
If there is more edge overlap in the high-declination part of the survey, that'll be a factor in explaining the 'shadow deficit'. It would create more defects per area. It does that even if there is no overlap, if each plate covers a smaller area of the sky. Could someone who's looked at the plate catalog in detail please comment on that?
 
You can also see the huge gouge taken out by the Galactic Plane, because they remove anything near a Gaia star.
That is the opposite to what I expected, since I expected more anomalies near the Galactic plane, rather than fewer. I'm wrong again!

But removing anomalies from the already crowded Galactic plane is something that would happen automatically, simply because there are more stars there which could be misinterpreted as anomalies.
Also as an aside you can see they have incorrectly plotted the survey in normal coordinates, and not Galactic.
There are three main coordinate systems; equatorial, Galactic, and ecliptic. I presume that by 'normal' you mean equatorial. Plotting in ecliptic coordinates might have made it easier to see if there is any detectable paucity in the ecliptic plane, since the Earth's shadow follows that line.
 
Last edited:
Knowing if the anomolies are more frequent near the edges of each individual plate would be useful. Each step along the way, from manufacturing the glass plates to exposure and processing to final storage and digitizing might involve processes that are more likely to effect the edges in some way. Then there is the question of where the UFO's are on the plates with respect to the edges. Knowing these might be useful, or might not be, but won't know if you don't check.
 
Each step along the way, from manufacturing the glass plates to exposure and processing to final storage and digitizing might involve processes that are more likely to effect the edges in some way.
Astronomical Journal, November 1956, Displacements of photographic emulsions and a method of processing to minimize this effect by Gollnow & Hagemann
t2png.jpg

The processing of the exposed plate displaces features near the edge by 40 micrometers. I think that's only 0.1 arcseconds of sky, but it shows that there's something going on at the edge.
And we don't know if it's worse for 103a-E, which was used for the data we're looking at.
 
And we don't know if it's worse for 103a-E, which was used for the data we're looking at.

That claim was made in the Medium article, but as I noted in the other thread, I can't really find evidence for it. Yet. But it does appear the red emulsions produced far more "transients".
 
Unfortunately the setup isn't reproducible via docker file at least yet. The first setup is a bash script that downloaded one 30x30 arcmin fits file and then attempted to perform analysis with sextractor, Stilts/TOPCAT against Pan-Starrs, USNO‑B1.0 and what not. I had a lot of problems with getting things to run correctly on OSX. Dockerizing that mess won't help anyone.
I'm now trying to create a better version using Python which would be more OSX agnostic.
Update: I've created a github repository for an attempt to reproduce software of this paper https://academic.oup.com/mnras/article/515/1/1380/6607509?login=false (Discovering vanishing objects in POSS I red images using the Virtual Observatory, 2022) - this paper contains a description of the software and various settings. Description is not exact, but I haven't seen any software published by Villarroel et al. anywhere yet, so this is perhaps the best starting point.

Github: https://github.com/jannefi/vasco/ This is version v0.06.8. If I follow my current plan, there will be quite many minor releases, and 1.0 release happens in 2-3 months. But because I can't allocate more time to this project, and other things tend to happen, it might take 6 months or more to complete 1.0.

But I wanted to share if anyone is interested in this approach. Main language is Python. Documentation is not very good yet, but this runs and it is able to get about 1000 image samples (or just one if you want) and perform basic analysis.

Edit: paper link fixed
 
Last edited:
Great work. It's impressive you managed to get PSFEx working, it is tricky as it's not well documented. I have some suggestions:

For the final analysis, I would caution against using the Skyview service to download the data, because the images it returns have been resampled. Resampling is essentially rebinning the image on to a different grid. This is why you can vary the pixel scale. For these sources it may make no difference, but it may slightly affect the profiles and sizes of objects. Maybe there is a way to get the originals, but I haven't found it. The STScI and ESO services do not resample.

You're using PanStarrs to find bright stars, to make the spike mask. This will fail to reproduce what they did because bright stars are totally saturated in a CCD survey like PanStarrs, and they are just not in the catalog at all. The paper uses USNO B.1, it is best to follow that.

I would also suggest doing the cross-matching with Gaia and PanStarrs with Stilts (or Topcat for testing and generating the Stilts command). These steps can mostly be done in a few commands. They used these tools in the paper, it's also more robust in that it doesn't have to be debugged. There may be subtle differences in the Stilts cross-matching compared to astroquery. I know you've already coded a lot of it, but it would have to be tested. There is a small bug in your distance calculations in xmatch.py:

res['sep'] = ((res['ra'] - ra_deg).abs() + (res['dec'] - dec_deg).abs())

Which is the Cartesian distance. For angles, there is a cos(dec) factor in the RA component. I would always use astropy.coordinates for such things, or Stilts. It's difficult to write all of this stuff flawlessly. Which is why offloading much of it to Stilts would make it more robust. Perhaps if you want to stick with astroquery, we could compare the output at some stage. I would recommend trying to keep the core routines simple and transparent, so that people can read through it line by line.
 
Last edited:
There is a small bug in your distance calculations in xmatch.py:

res['sep'] = ((res['ra'] - ra_deg).abs() + (res['dec'] - dec_deg).abs())

Which is the Cartesian distance. For angles, there is a cos(dec) factor in the RA component. I would always use astropy.coordinates for such things, or Stilts.
It's worse than that, it's an L^1 metric, and basis dependent (distance will depend on which where "up" has been agreed upon), rather than a more typical L^2 metric (distances remain constant under arbitrary rotations).
 
Great work. It's impressive you managed to get PSFEx working, it is tricky as it's not well documented. I have some suggestions:

For the final analysis, I would caution against using the Skyview service to download the data, because the images it returns have been resampled. Resampling is essentially rebinning the image on to a different grid. This is why you can vary the pixel scale. For these sources it may make no difference, but it may slightly affect the profiles and sizes of objects. Maybe there is a way to get the originals, but I haven't found it. The STScI and ESO services do not resample.

You're using PanStarrs to find bright stars, to make the spike mask. This will fail to reproduce what they did because bright stars are totally saturated in a CCD survey like PanStarrs, and they are just not in the catalog at all. The paper uses USNO B.1, it is best to follow that.
Many thanks for your comments and suggestions!

I'm trying to find time to work on this project, but currently I'm too busy with other things.

I found something that might be related. The MNRAS 2022 paper talks about using Aladin's scripting capabilities. Similar work was done already in 2008-2011 by a different group using POSS-I and POSS-II images. I found this PDF document link from the Spanish Virtual Observatory web site: https://svo.cab.inta-csic.es/docs/files/svo/Public/ProAm/Tutorial_oagcpms_english[1].pdf

External Quote:

Summarizing, our method consist in set a blinking between the images from POSS1 and POSS2 surveys and crossing them with the catalogues USNOB1 and NOMAD1. From them we obtain the CPM data of the detected pairs. We use as well UCAC3 catalogue if it includes the stars. Additionally we make the relative astrometry of the new pairs, measuring the separation and the position angle on the images.
...
I'm unsure if the whole 2022 pipeline was based on Aladin's scripting capabilities or just for tessellation & data retrieval, and manual inspection. Probably latter.
 
Last edited:
I'm still working on trying to reproduce their catalog. I have got pretty good settings for SExtractor and PSFEx, which seem to recover most of the objects in their catalog.

But I thought I'd share some info I found while investigating the NEOWISE matched catalog they published. I was originally confused because I cross-matched my catalogs to various WISE catalogs (UnWISE, ALLWISE, WISE) but very few sources matched to these catalogs. And even if I matched their NEOWISE catalog to these tables, most of the NEOWISE sources are not in them. So I had a look in the IRSA and eventually worked out that the catalog they had used for cross-matching is the NEOWISE-R Single Exposure (L1b) Source Table. The flags and objects are identical. But this is totally bizarre. The single exposure table makes a catalog of every single 7.7 second NEOWISE exposure and concatenates them together. It does not check if a source is detected in multiple exposures, or even in both filters. It is filled with duplicate and junk. The catalogs I looked at originally are catalogs made from the combined WISE and NEOWISE data, stacking the images to get much better depth and precision. The single source catalog has about 200 billion sources in it, the combined catalogs only have about 2 billion. Indeed the explanatory supplement has lots of warnings:

Entries in the Single-exposure Source Database include detections of real astrophysical and solar system objects, as well as spurious detections of low SNR noise excursions, transient events such as hot pixels, charged particle strikes and satellite streaks, and image artifacts light from bright sources including the moon. Tips for avoiding unreliable detections are given in II.3, but they have not been filtered out of the Database. Therefore, the Database must be used with caution. Users are strongly encouraged to read the Cautionary Notes before using the Database.

https://irsa.ipac.caltech.edu/data/WISE/docs/release/NEOWISE/expsup/sec2_1a.html

Only a small fraction of the sources are confirmed in the deeper combined catalogs. The problem with this catalog is that it has so many sources that the probability of any position having a match within 5 arc seconds becomes quite high. I cross-matched some random positions and found 80 to 90% of them had a match within 5 arc seconds. And that is after removing the later data which is not part of their catalog. Some have discussed here that having so many matches shows these objects are real, but having a match to this NEOWISE catalog does not demonstrate that. Some are, most are not.

It seems like they didn't read this, because there is no logical reason to use this table. The papers don't specify which catalog they used or justify the use of this table. I suspect they just looked at every table they could find, without much thought. It also means the sample used in the shadow analysis will be affected by the systematics of the NEOWISE catalog, as they claim to have removed matches. I'm not suggesting this must have caused the shadow, but I suspect they did not include the effect of cross-matching in their simulations.

I'm adding this here as it's important to use the right catalog to test replication efforts, since we only have a subset of the candidates. It also seems like they used an earlier version, which stops at MJD 59198. These catalogs are painfully large, so I am only working with them on IRSA.
 
Last edited:
Great work. It's impressive you managed to get PSFEx working, it is tricky as it's not well documented. I have some suggestions:

For the final analysis, I would caution against using the Skyview service to download the data, because the images it returns have been resampled. Resampling is essentially rebinning the image on to a different grid. This is why you can vary the pixel scale. For these sources it may make no difference, but it may slightly affect the profiles and sizes of objects. Maybe there is a way to get the originals, but I haven't found it. The STScI and ESO services do not resample.

You're using PanStarrs to find bright stars, to make the spike mask. This will fail to reproduce what they did because bright stars are totally saturated in a CCD survey like PanStarrs, and they are just not in the catalog at all. The paper uses USNO B.1, it is best to follow that.

I would also suggest doing the cross-matching with Gaia and PanStarrs with Stilts (or Topcat for testing and generating the Stilts command). These steps can mostly be done in a few commands. They used these tools in the paper, it's also more robust in that it doesn't have to be debugged. There may be subtle differences in the Stilts cross-matching compared to astroquery. I know you've already coded a lot of it, but it would have to be tested. There is a small bug in your distance calculations in xmatch.py:

res['sep'] = ((res['ra'] - ra_deg).abs() + (res['dec'] - dec_deg).abs())

Which is the Cartesian distance. For angles, there is a cos(dec) factor in the RA component. I would always use astropy.coordinates for such things, or Stilts. It's difficult to write all of this stuff flawlessly. Which is why offloading much of it to Stilts would make it more robust. Perhaps if you want to stick with astroquery, we could compare the output at some stage. I would recommend trying to keep the core routines simple and transparent, so that people can read through it line by line.
I hope your suggestions are now in place: https://github.com/jannefi/vasco (current v.0.06.9)
stilts is now required component, it's used quite a bit.

Using USNO B.1. catalog turned out to be tricky: the endpoint I'm calling can be slow. Depending on who-knows-what and the actual parameters, one call can take few seconds or few minutes. Endpoint is:
catalogs.mast.stsci.edu/api/v0.1/panstarrs//dr2/mean.csv. I couldn't find any alternatives.

It took a lot of time to configure and figure out psfex and sextractor parameters. The current parameters work and I hope they are close to what they used in the 2022 MNRAS paper.

This photometry project turned out to be much more complex than I thought. Many astronomers probably would have implemented this already :p Let's see if I actually manage to finalise this some day. Hopefully Villarroell et al. publish their sw soon, because implementing this kind of software from scratch can be frustrating without any prior knowledge and experience.

Edit: I noticed a weird issue. Some downloaded plates are from SERC-EJ survey, some from POSSI-E. Trying to figure out how that is possible and fix it
Edit 2: made a quick fix to enforce POSSI-E plates, others are discarded. Examples updated with working coordinates.
 
Last edited:
I hope your suggestions are now in place: https://github.com/jannefi/vasco (current v.0.06.9)
stilts is now required component, it's used quite a bit.
Update: I added Docker-support. Now anyone can try tun e.g. run-random.py script without installing anything except Docker. Bit more info here: https://github.com/jannefi/vasco/blob/main/DOCKER_READ.md

I'm struggling with stilts cdsskymatch, especially with larger catalog "II/389/ps1_dr2" (which is Pan-STARRS DR2 via Vizier). Vizier service seems to randomly decide that I'm asking too much data or asking data too often, and returns "service not avail" or similar. I added pauses between the calls, but that does not seem to matter. I have no idea how to fix this. I don't know if the same catalog is available via some other tool or website.

I approached Villarroel for help. She was not involved with technical track, but promised to try to find someone who could help a bit. She asked me to try to correct some misunderstandings about the latest papers posted in Metabunk. I promised to try.

I have made some comments about the papers that are biased - or at least not based on scientifically valid data. I didn't know how to read scientific publications properly before this. I asked support from few scientists. One is luckily astrophysicist who also knows about photometry. However, in this case photometry is difficult because of the old glass plates. Not many scientists work with them anymore.

The science part of the papers is likely valid. The hypothetical part about UFOs - which tend to be the topic discussed in the news and other media - is just that, a hypothesis. The papers don't claim the UFOs are scientifically proven. I don't think the UFO hypothesis will hold, but that doesn't mean the scientific results or datasets are not valid. All data, methods and datasets seem to be valid. I won't say much about Earth's shadow or nuclear test parts for now, because I haven't looked into the data or methods yet. I'm concentrating on reproducing results of the 2022 paper first, and it's quite difficult. Looking at first results of a small number of tests, I think the results will be similar, but not the same.

All this would be easier if the team who made the software pipeline perhaps before 2022 would publish the source code. It's very easy to get bad results if you make mistakes or use wrong data like select a wrong sky survey catalog for comparison. Or use a wrong POSS-I image source. It's easy to get this wrong. It's also hard to verify test results against their catalog/dataset, because getting 1:1 match is probably not possible.
 
Update: I added Docker-support. Now anyone can try tun e.g. run-random.py script without installing anything except Docker. Bit more info here: https://github.com/jannefi/vasco/blob/main/DOCKER_READ.md
Looking at results of today's short test run, I seem to have some problem with cross-matching. I introduced new functionality that attempts to produce a list of transient candidates based on x-match of two external catalogs (within 5 arcsec radius). There are two candidate csv-files per one 60x60 arcmin tile. But the code has not found any transient candidates after some 60 tiles. This is clear error indicator. Just like with the previous version, which produced hundreds of transient candidates per tile... like said, this is not easy.

I need to revisit the latest changes when time permits. The POSS-I images and data from external catalogs should be correct, but of course I need to check those, too - especially the external catalogs which are now handled via stilts cdsskymatch.

I have already implemented alternative approach where the data from external catalogs are downloaded as csv files before cross-matching, but there was a major challenge with one central source: it was extremely slow. One http/CGI call could take 2-10 minutes to complete, and of course it timed out every now and then. Villarroel's team used stilts cdsskymatch in 2022. It's better to use same methods when possible.

I forgot to mention something possibly important: using full POSS-I plate scans with current software is probably not going to work. I know some of you started downloading POSS-I full red plates early on just like I did. But they are probably too large for scripts and other 3rd party software. Bigger machine with lots of memory might help, but not necessarily.

Edit/update: I think I found the cross-matching bug(s) and fixed them. Today test runs have went quite well. Even the external catalogs haver responded to stilts cdsskymatch queries. This doesn't mean this version is final and fully according to MNRAS 2022 paper, but it can provide useful data for future work
 
Last edited:
I forgot to mention something possibly important: using full POSS-I plate scans with current software is probably not going to work. I know some of you started downloading POSS-I full red plates early on just like I did. But they are probably too large for scripts and other 3rd party software. Bigger machine with lots of memory might help, but not necessarily.
I downloaded them all, I don't have any issue loading them into my FITS parser
https://www.metabunk.org/sitrec/tools/fits.html
BUT, I have 64MB. I think it would still JUST work with 16GB. It takes like 30 seconds to load one image.

I think if you have a local catalog and some clever programming, you could probably get it down to a few minutes of processing to detect transients/artifacts. Not something I have time for right now.
 
The science part of the papers is likely valid. ...All data, methods and datasets seem to be valid.

If the transients identified by Villarroel are indeed "true" transients, which is questioned by photographic astronomers Hambly and Blair
("On the nature of apparent transient sources on the National Geographic Society–Palomar Observatory Sky Survey glass copy plates",
N.C. Hambly, A. Blair, RAS Techniques and Instruments 3 (1), 2024 https://academic.oup.com/rasti/article/3/1/73/7601398?login=false).

And if there has been no methodological error or unintended bias in counting transients within and without Earth's shadow.

The papers don't claim the UFOs are scientifically proven.
Villarroel made it clear on NewsNation, in an interview with Ross Coulthart, that she believes her research demonstrates that there were thousands of extraterrestrial technological artefacts near the Earth in the 1950s, and that she thinks they're still there:
External Quote:

...if these are what I think they are, I mean if these are artificial objects which the signatures are pointing towards, I have no idea what they could have done or if they are there still; I would assume they are there still.
NewsNation interview, 21 October 2025 (viewable in post #368), approx. 7 mins 29 secs into the video

Coulthart's views on UFOs are widely shared by UFO enthusiasts but probably not by most astronomers/ other relevant scientists, and Villarroel must have known that.
This is an extraordinary claim, on camera, of the type that people on Metabunk might be interested in.

Ditto, the claimed correlation with nuclear tests. This was inspired by the Ufology trope of aliens being interested Earth's nuclear technology (there is nothing wrong per se in researching that correlation). This was in "Some Transients in the Palomar Observatory Sky Survey (POSS-I) May Be Associated with Above-Ground Nuclear Testing and Reports of Unidentified Anomalous Phenomena", Stephen Bruehl, Beatriz Villarroel 2025, Scientific Reports 15 https://www.nature.com/articles/s41598-025-21620-3
External Quote:
Results revealed significant (p = .008) associations between nuclear testing and observed transients, with transients 45% more likely on dates within + /- 1 day of nuclear testing.
This is a claimed research finding.

Villarroel's claims are extraordinary, and if correct would be profoundly significant.
They would be amongst the most important scientific discoveries ever made. Finding just one alien artefact in GEO would constitute proof of extraterrestrial life and, more than that, that technologically advanced alien civilisations have been monitoring Earth from arbitrarily close range in recent decades.
Villarroel has publicly stated she believes such objects existed in their thousands, and that they're probably still there. She should not be surprised that such claims get attention, and that many of her peers are not yet convinced.
 
Last edited:
If the transients identified by Villarroel are indeed "true" transients, which is questioned by photographic astronomers Hambly and Blair
("On the nature of apparent transient sources on the National Geographic Society–Palomar Observatory Sky Survey glass copy plates",
N.C. Hambly, A. Blair, RAS Techniques and Instruments 3 (1), 2024 https://academic.oup.com/rasti/article/3/1/73/7601398?login=false).

And if there has been no methodological error or unintended bias in counting transients within and without Earth's shadow.

Villarroel made it clear on NewsNation, in an interview with Ross Coulthart, that she believes her research demonstrates that there were thousands of extraterrestrial technological artefacts near the Earth in the 1950s, and that she thinks they're still there:

Coulthart's views on UFOs are widely shared by UFO enthusiasts but probably not by most astronomers/ other relevant scientists, and Villarroel must have known that.
This is an extraordinary claim, on camera, of the type that people on Metabunk might be interested in.

Ditto, the claimed correlation with nuclear tests. This was inspired by the Ufology trope of aliens being interested Earth's nuclear technology (there is nothing wrong per se in researching that correlation). This was in "Some Transients in the Palomar Observatory Sky Survey (POSS-I) May Be Associated with Above-Ground Nuclear Testing and Reports of Unidentified Anomalous Phenomena", Stephen Bruehl, Beatriz Villarroel 2025, Scientific Reports 15 https://www.nature.com/articles/s41598-025-21620-3
External Quote:
Results revealed significant (p = .008) associations between nuclear testing and observed transients, with transients 45% more likely on dates within + /- 1 day of nuclear testing.
This is a claimed research finding.

Villarroel's claims are extraordinary, and if correct would be profoundly significant.
They would be amongst the most important scientific discoveries ever made. Finding just one alien artefact in GEO would constitute proof of extraterrestrial life and, more than that, that technologically advanced alien civilisations have been monitoring Earth from arbitrarily close range in recent decades.
Villarroel has publicly stated she believes such objects existed in their thousands, and that they're probably still there. She should not be surprised that such claims get attention, and that many of her peers are not yet convinced.
Mostly agreed. Many of the skeptical comments, including some of mine, are coming from the extraordinary hypothesis and claims made in public. Those of course deserve criticism.

What I'm trying to say is that there has been comments on Metabunk and in social media, that "attacks" the science part of the papers, usually without evidence, or due to misunderstandings/misreading the papers. For example:
- Neither paper claims that "ET UFOs" are proven. This is a hypothesis. Authors state that origin (of transients) remains uncertain and alternative explanations cannot be ruled out.
- MNRAS paper establishes correlation, not causation. It doesn't assert that transients are ET spacecraft
- Earth's shadow: the analysis does not eliminate plate defects. It argues an "all defects" explanation by showing a statistically significant deficit of events in shadow. But there are transients detected in the umbra. A mixture of causes, including defects, remains possible
What would be needed to criticise the science part: 1) independent replication. 2) Targeted microscopy of the "candidate" plates, and perhaps a random control set from a large catalog to quantify a defect base rate. 3) Cross-survey checks using the same criteria. 4) Perhaps even pre-registered re-analysis (this is something that other scientists could do if they feel it is needed).
Before 1-3 are done, the technosignature/UAP angle remains a hypothesis (not a conclusion).

Authors also know all this, and state some clearly: "Our findings do not definitively indicate what transients are nor do they necessarily imply causal associations" (SciRep). "Without an inspection with a microscope one still cannot exclude plate defects" (PASP).

This is why I chose to reproduce (or at least try...) starting from the 2022 paper which contain description of the pipeline. We have to start somewhere before making claims about the science of papers. If I or someone else gets good results, we have better basis for scientific discussions. Discussing the hypothesis presented in the papers is a different topic, and it would make sense to keep them separate from science part. I believe most findings could be explained through microscopy like Hambly & Blair have suggested, but that's something I cannot do. Meanwhile I will keep trying with software, but that will take time and this attempt might fail. I hope there will be some professional attempts to reproduce, too, but these things don't happen overnight. It might take few years from the start.
 
What I'm trying to say is that there has been comments on Metabunk and in social media, that "attacks" the science part of the papers, usually without evidence, or due to misunderstandings/misreading the papers. For example:
- Neither paper claims that "ET UFOs" are proven. This is a hypothesis. Authors state that origin (of transients) remains uncertain and alternative explanations cannot be ruled out.
- MNRAS paper establishes correlation, not causation. It doesn't assert that transients are ET spacecraft
- Earth's shadow: the analysis does not eliminate plate defects. It argues an "all defects" explanation by showing a statistically significant deficit of events in shadow. But there are transients detected in the umbra. A mixture of causes, including defects, remains possible
What would be needed to criticise the science part: 1) independent replication. 2) Targeted microscopy of the "candidate" plates, and perhaps a random control set from a large catalog to quantify a defect base rate. 3) Cross-survey checks using the same criteria. 4) Perhaps even pre-registered re-analysis (this is something that other scientists could do if they feel it is needed).
Before 1-3 are done, the technosignature/UAP angle remains a hypothesis (not a conclusion).
Well, none of this adresses my criticism: the paper has failed to convince me that the optical transients existed outside the telescope/camera/reproduction system. I don't have issues with Villarroel claiming what they could be, I just don't think they exist. That's because the statistics needed to convince me of it isn't in the paper, but some of the statements in the paper indicate that the statistical assumptions it's founded on are actually incorrect. (The fact that she cites the older Solana paper, but discards its results unexplained, does not help.)

What would be needed to critique the statistics better is for Villarroel to publish her data sets. We only need to re-engineer her methods because she has not done that. I can't really see a good reason why she hasn't.
 
Last edited:
I downloaded them all, I don't have any issue loading them into my FITS parser
https://www.metabunk.org/sitrec/tools/fits.html
BUT, I have 64MB. I think it would still JUST work with 16GB. It takes like 30 seconds to load one image.

I think if you have a local catalog and some clever programming, you could probably get it down to a few minutes of processing to detect transients/artifacts. Not something I have time for right now.
Yep, Sitrec use-case is probably different if there's no need to analyze each plate with software like Sextractor and psfex, or crossmatch plate data with other sky survey catalos.
 
Well, none of this adresses my criticism: the paper has failed to convince me that the optical transients existed outside the telescope/camera/reproduction system. I don't have issues with Villarroel claiming what they could be, I just don't think they exist. That's because the statistics needed to convince me of it isn't in the paper, but some of the statements in the paper indicate that the statistical assumptions it's founded on are actually incorrect. (The fact that she cites the older Solano paper, but discards its results unexplained, does not help.)

What would be needed to critique the statistics better is for Villarroel to publish her data sets. We only need to re-engineer her methods because she has not done that. I can't really see a good reason why she hasn't.
Agreed: they should release the datasets and software pipeline. They used the MNRAS 2022 dataset (one I'm trying to reproduce).
Perhaps they could do that based on "reasonable request".

Here's the list with over 171K candidates: http://svocats.cab.inta-csic.es/vanish-neowise/index.php?action=search
Here's 410 candidates: http://svocats.cab.inta-csic.es/vanish-onlyir/index.php?action=search
And 5399 candidates: http://svocats.cab.inta-csic.es/vanish-possi/index.php?action=search
(via https://academic.oup.com/mnras/article/515/1/1380/6607509 MNRAS 2022)
 
They used the MNRAS 2022 dataset
no, Villarroel uses neither of the sets you indicated. I believe she uses the unpublished output of the Solana 2022 "candidate selection" step minus the NeoWISE dataset (171K, published) that already has a visible grid structure, which the stars don't have. Her "lines" are not in the 5399 dataset, none of the data points in them, as far as I could tell.
 
no, Villarroel uses neither of the sets you indicated. I believe she uses the unpublished output of the Solana 2022 "candidate selection" step minus the NeoWISE dataset (171K, published) that already has a visible grid structure, which the stars don't have. Her "lines" are not in the 5399 dataset, none of the data points in them, as far as I could tell.
Let's consider Solano/MNRAS 2022 as the "master catalog" with 298165 POSS‑I‑only detections. After matching those with modern catalogs, we have 5399 "unidentified transients", and 172163 IR‑only sources (NEOWISE).

PASP 2025 uses the 2022 "master catalog" and further filters it down to 106,339 items (e.g. no counterparts in Gaia/Pan‑STARRS/NEOWISE). External quotes below. I couldn't find other datasets. Did I miss something?

From PASP 2025:
We use the transient candidates from Solano et al. (2022), but with the additional requirement that they have no counterparts within 5″ in Gaia, Pan‑STARRS and NeoWise. Furthermore, we restrict our analysis to objects in the northern hemisphere (decl. > 0°). This yields a sample of 106,339 transients, which we use for our study.

Solano/MNRAS 2022:
In this paper, we report a search for vanishing sources in POSS I red images using virtual observatory (VO) archives, tools, and services. The search, conducted in the framework of the VASCO project, aims at finding POSS I (red) sources not present in recent catalogues like Pan-STARRS DR2 (limiting magnitude r = 21.4) or Gaia EDR3 (limiting magnitude G = 21). We found 298,165 sources visible only in POSS I plates, out of which 288,770 had a cross-match within 5 arcsec in other archives (mainly in the infrared), 189 were classified as asteroids, 35 as variable objects, 3592 as artefacts from the comparison to a second digitization (Supercosmos), and 180 as high proper motion objects without information on proper motion in Gaia EDR3. The remaining unidentified transients (5399) as well as the 172,163 sources not detected in the optical but identified in the infrared regime are available from a VO compliant archive…"
 
PASP 2025 uses the 2022 "master catalog" and further filters it down to 106,339 items (e.g. no counterparts in Gaia/Pan‑STARRS/NEOWISE). External quotes below. I couldn't find other datasets. Did I miss something?
that is approximately correct, but it's not a data set that MNRAS 2022 arrived at or published.
 
that is approximately correct, but it's not a data set that MNRAS 2022 arrived at or published.
Solano 2022 MNRAS (https://academic.oup.com/mnras/article/515/1/1380/6607509): POSS-I "master catalog" 298,165; 5,399 unidentified; 172,163 IR-only including 171,753 NEOWISE. Archive: http://svocats.cab.inta-csic.es/vanish/ According to @ThickTarget, the "NEOWISE" is actually NEOWISE-R Single Exposure (L1b) Source Table.

Solano/MNRAS 2022:
External Quote:
"In this paper, we report a search for vanishing sources in POSS I red images using virtual observatory (VO) archives, tools, and services. The search, conducted in the framework of the VASCO project, aims at finding POSS I (red) sources not present in recent catalogues like Pan‑STARRS DR2 (limiting magnitude r = 21.4) or Gaia EDR3 (limiting magnitude G = 21). We found 298,165 sources visible only in POSS I plates, out of which 288,770 had a cross‑match within 5 arcsec in other archives (mainly in the infrared), 189 were classified as asteroids, 35 as variable objects, 3592 as artefacts from the comparison to a second digitization (Supercosmos), and 180 as high proper motion objects without information on proper motion in Gaia EDR3. The remaining unidentified transients (5399) as well as the 172,163 sources not detected in the optical but identified in the infrared regime are available from a VO compliant archive…"
Note: they did not publish the ~298K dataset via the archive webpage, but that's not needed. This number is basically all possible stellar objects found in all POSS-I red plates. You need to run the data through the pipeline anyway because most of the data is already covered by other catalogs.

This process flow picture is from MRAS 2022 paper. I've been using it as reference for the sw implementation. Most important figures from repro perspective are: 298165 and 5399. Deduction round numbers may vary depending on the sky catalog used (or the source of the sky catalog)
1765089182250.png
 
Last edited:
Note: they did not publish the ~298K dataset via the archive webpage, but that's not needed. This number is basically all possible stellar objects found in all POSS-I red plates.
That's a mischaracterisation.
The 298165 objects are transient candidates, i.e. dots on POSSI-E plates that don't look like obvious artifacts and that are not stellar objects in GAIA or PAN-STARR. The set still contains both astronomical objects (stellar and not) and artifacts.

If we had this set, we could derive Villarroel's set by subtracting the published neoWISE matches, but we don't.
We can not characterize this set statistically, either, because we do not have it.


This process flow picture is from MRAS 2022 paper. I've been using it as reference for the sw implementation. Most important figures from repro perspective are: 298165 and 5399. Deduction round numbers may vary depending on the sky catalog used (or the source of the sky catalog)
1765089182250.png
You wrote, "They used the MNRAS 2022 dataset", but there is no point in this flow chart where you can identify either of the two Villarroel 2025 data sets (unpublished).
I think (but it's unclear) that she used the 298165 data set (unpublished) and matched it with neoWISE, but didn't apply any other steps, because she would've been unable to draw interesting conclusions from the 5399 data set (which still shows the plate grid pattern, btw).
 
That's a mischaracterisation.
The 298165 objects are transient candidates, i.e. dots on POSSI-E plates that don't look like obvious artifacts and that are not stellar objects in GAIA or PAN-STARR. The set still contains both astronomical objects (stellar and not) and artifacts.

If we had this set, we could derive Villarroel's set by subtracting the published neoWISE matches, but we don't.
We can not characterize this set statistically, either, because we do not have it.



You wrote, "They used the MNRAS 2022 dataset", but there is no point in this flow chart where you can identify either of the two Villarroel 2025 data sets (unpublished).
I think (but it's unclear) that she used the 298165 data set (unpublished) and matched it with neoWISE, but didn't apply any other steps, because she would've been unable to draw interesting conclusions from the 5399 data set (which still shows the plate grid pattern, btw).
Correct, I made a mistake claiming the 298K is all dots (or possible stellar objects) in all POSS-I red images. There are millions of dots in those images. 298K is only those without matches in the two modern optical catalogues at 5″.

But AFAIK it's quite normal not to publish large datasets if the number of sources is enough, and the derived smaller datasets are the ones used in analysis. Full 298K dataset could be useful probably in some use-cases, and I think it could be a "reasonable request" toward the authors.

I don't need it in repro work: the data is extracted from POSS-I plate scans with sw. It's a lot of data, which then goes through the pipeline for further deductions, cross-matches etc.
 
She asked me to try to correct some misunderstandings about the latest papers posted in Metabunk
I appreciate all the work you, Mick, Mendel and others are doing here to reproduce these papers. I saw a tweet where Dr Villarroel asked if you could to bring corrections to metabunk. Are corrections/clarifications of whichever misunderstandings public on twitter somewhere?
 
I appreciate all the work you, Mick, Mendel and others are doing here to reproduce these papers. I saw a tweet where Dr Villarroel asked if you could to bring corrections to metabunk. Are corrections/clarifications of whichever misunderstandings public on twitter somewhere?
Thank you. Like I posted earlier, I asked technical help from Villarroel. She promised to try to find a person who might be able to help, but there has been no progress. She did not specify the "factual misunderstandings still circulating on Metabunk". So I decided to post some generic clarifications: https://www.metabunk.org/threads/digitized-sky-survey-poss-1.14385/post-358825 - hoping I would get some technical help in return
 
Huh. I've finally reached first‑light (or perhaps first‑dim‑light :p) for the VASCO pipeline reproduction (MNRAS 2022). Using the published NEOWISE‑only VASCO catalogue (171,753 sources) and a small optical pilot footprint (<300 tiles, 30′×30′ each), scripts could find 69 secure matches within 2.0″. The end‑to‑end workflow (SExtractor -> CDS xmatch -> tile merge -> NEOWISE <-> optical compare) is working, and the numbers/plots look sane.

I should be able reproduce the core findings from the VASCO MNRAS work after enough data goes through the pipeline, and then continue from there. I'm running out of disk space fast so I ordered a 2TB SSD dedicated for this project. I've had a lot of technical challenges but now the sw is stable enough. It's not user-friendly, but for the main part, you can do the work with one or two scripts running on a container.

See the first-light info here: https://github.com/jannefi/vasco/tree/main/first-light :cool:

A lot of things have changed during the past weeks, please see: https://github.com/jannefi/vasco

I haven't received any help from anyone - except earlier from two "uninvolved" and very busy scientists I posted about. So this milestone also goes to show this kind of project is doable without prior knowledge on photometry and with just generic understanding of astronomy. I have had a telescope for decades and that got me interested in astronomy in the first place.

All questions, comments, etc. are welcome. Meanwhile I'll continue collecting POSS-I plates and running them through the sw pipeline. Southern sky is pretty large. The sw now downloads 30x30 arcmin tiles so that means about 120K tiles. Getting a decent coverage like 25% still takes time and lots of disk space.
 
I'd be willing to lend some processing power. Any way to provision parts of the sky so we don't process the same areas?
Great, good to hear! Provisioning is not possible with the current script: it selects random 30x30 arcmin tiles from the northern sky. I'm not even planning to "tessellate" the sky in any certain order. But it doesn't matter much: we can compare collected tiles at some point. Sw keeps record of practically everything.

As a new bonus, it finds the matching dss1red plate and stores it next to the plate metadata. Full plates here: https://irsa.ipac.caltech.edu/data/DSS/images/dss1red/ - unfortunately you must download all red plates if you want this bonus. But it enables visual checking of e.g. plate edges, which wasn't easy with the data sources the author's have published in various papers.

Note: please check every now and then If I've updated the code repository. I've recently made some important changes, because data sizes are becoming too large (yesterday I had 22 million sources found from about 3000 tiles and my laptop started to choke). I had to convert large csv files to Parquet format. With parquet format, the pipeline should run fairly ok with hundreds of million sources. If not, I must consider a database approach, but that's technically challenging (for my laptop).

Also note: CPU consumption should not be a big problem, this sw runs quite fast with decent PC/laptop. Most of the time it just waits for some network service to respond. It takes fair amount of disk space depending of how many tiles you download and process.

Edit additional info added: you can also download tiles non-randomly by using coordinates. It's also possible to download bigger than 30x30 arcmin tiles, but that can lead to problems when pipeline does cross-checking against two large catalogs online using stilts cdsskymatch. The manual says they can return up to 1M rows, but in reality, that is not true. With 30x30 tiles, you get back hundreds or up to 10,000 rows. Sometimes the services don't like that either, and stop accepting requests for awhile. This "denial-of-service" seems to depend on how many others are running queries at the same time, and not so much on the amount of data requested.

I'm amazed that these services keep working 24/7 - and quite well. For example, the other catalog, Pan-STARRS DR2, contains 1,8 billion sources. Any database query against that amount of data is quite heavy task.
 
Last edited:
Not sure if this paper has been discussed earlier: Exploring nine simultaneously occurring transients on April 12th 1950 (Villarroel et. al. 2021). https://www.nature.com/articles/s41598-021-92162-7

Quote from the Abstract:
External Quote:
Nine point sources appeared within half an hour on a region within 10 arcmin of a red-sensitive photographic plate taken in April 1950 as part of the historic Palomar Sky Survey. All nine sources are absent on both previous and later photographic images, and absent in modern surveys with CCD detectors which go several magnitudes deeper.
Sounds familiar. Good news is that my software found most of these independently. Five out nine transient candidates remained after pipeline steps 2-6 were completed. The more strict criteria used in the MNRAS 2022 paper dropped few of them.

But none of them seem to be present in the derived catalog (2022): http://svocats.cab.inta-csic.es/vanish-neowise/ (List of vanishing objects seen in NEOWISE but not in the optical / infrared - 171,753 sources).

It looks like these nine point sources vanished fully in MNRAS 2022. I want to find out exactly where in the pipeline that happened. I haven't seen any publication mentioning these nine sources after 2021. They never published the the large catalogue of MNRAS 2022 (298,164 candidates). I'm interested if these nine are in that dataset. I assume and hope five of them are, like in my pipeline.

Edit: interesting comment posted by Howard E Bond (July 28, 2021):
External Quote:

I'm afraid the "transients" reported in this paper are simply defects in the photographic emulsion of the 1950 plate. The field where these objects are located is near the corner of the photographic plate, and there is obvious damage to the emulsion in the form of white areas (missing emulsion), scratches, small dust particles, etc. The emulsion appears to have been touched or otherwise damaged, either before loading into the plateholder, or during development after the exposure, or conceivably before or during the scanning of the plate at STScI. The only slight puzzle is how the damage to the emulsion made some of these artifacts resemble stellar images (of which there are numerous examples in the corner of the plate), but there is no doubt that they are artifacts and not real celestial objects.

The first picture zooms in on the location of the 6 "transients" listed in Table 1, circled in red. Note the significant emulsion damage including white spots where the emulsion has been entirely removed.
Image link 1: https://uploads.disquscdn.com/image...bd22eaae7248b130c4b5e833ee1a2892340ea3e97.jpg
Image link 2: https://uploads.disquscdn.com/image...f2353017fb5a4effaf35ab91d7319f1b70b093a3b.png
 
Last edited:
Has the other scan been checked?
Hambly and Blair inspected two glass copy plates of POSSI survey fields E0070 and E0086 visually.
See https://academic.oup.com/rasti/article/3/1/73/7601398?login=false Article "On the nature of apparent transient sources on the National Geographic Society–Palomar Observatory Sky Survey glass copy plates", February 2024.

Quote from the Abstract
External Quote:
We find that the putative transients are likely to be spurious artefacts of the photographic emulsion. We suggest a possible cause of the appearance of these images as resulting from the copying procedure employed to disseminate glass copy survey atlas sets in the era before large-scale digitization programmes.
They talk about "nine point-like detections" so their glass copy plates are the same Villarroel et. al. used in 2021. DSS plate id's: XE325 (red) and XE424 (blue). Note that the UK glass copy plates have different plate ID scheme - their plate id's are E0070 and E0068.

External Quote:
We confirmed the presence of nine point-like detections on E0070 at the positions reported, albeit appearing (to the eye aided by microscope) as generally rather more concentrated and circular than typical stellar images of the same estimated brightness. We noted, also as originally reported, that they are absent in the overlapping plate E0086 (exposed 6 d later) but furthermore that there are similar detections present on E0086 that are not present on E0070. At the same time, we noted also the presence of various blemishes on the POSSI copy plates and we present some examples
Hambly and Blair only checked two plates visually. They and few other scientists have suggested that Villarroel et. al. should inspect original POSS-I glass plates with microscope. Villarroel et. al. have noted that they don't have access to the original glass plates.
People on this forum have tried to find out where the original plates are stored, but couldn't find a clear answer.
 
I've managed to download and process over 4,000 tiles (30x30 arcmin each). Software pipeline made 37,9 million detections (or possible sources). But I will need at least about 25,000 tiles. That means about 20% of POSS-I coverage. The dedicated 2TB disk might not be enough, after all :eek:

I'm running the software on a laptop (MacBook Air). It gets a bit warm during the process so I decided to run the sw only when I'm around and awake.

1765908574393.png
 
I've managed to download and process over 4,000 tiles (30x30 arcmin each). Software pipeline made 37,9 million detections (or possible sources). But I will need at least about 25,000 tiles. That means about 20% of POSS-I coverage. The dedicated 2TB disk might not be enough, after all :eek:

I'm running the software on a laptop (MacBook Air). It gets a bit warm during the process so I decided to run the sw only when I'm around and awake.

View attachment 87132
Can you save each image set to a RAM disk, process it, then delete it once done or do you need a large dataset stored to work on it?

Linux has the /tmp drive I've always used. Not sure it Mac has something similar.
 
Can you save each image set to a RAM disk, process it, then delete it once done or do you need a large dataset stored to work on it?

Linux has the /tmp drive I've always used. Not sure it Mac has something similar.
Mac is Unix-like system: /tmp is there :) But unfortunately I can't delete anything for now. All data is needed, including fits for visual inspection. Yesterday I found a major data issue and today I fixed it. It would not have been possible without all collected data. That's why I must keep everything, and also implement the final reduction round against NeoWISE last so that I won't miss anything important
 
I've managed to download and process over 4,000 tiles (30x30 arcmin each). Software pipeline made 37,9 million detections (or possible sources). But I will need at least about 25,000 tiles. That means about 20% of POSS-I coverage. The dedicated 2TB disk might not be enough, after all :eek:

I'm running the software on a laptop (MacBook Air). It gets a bit warm during the process so I decided to run the sw only when I'm around and awake.
Something on Tuesday's statistics didn't look right. So I did a "self-audit" and found a systematic astrometric data quality issue. All collected data and any data that has went through pipeline steps 1-6 looked fine, and appears to be in-line with MNRAS 2022. There was a clear astrometric accuracy issue or "astrometry drift". I had to implement a new post-pipeline script that fits a per‑tile polynomial plate solution to Gaia matches. It then writes corrected coordinates into each tile's final_catalog_wcsfix.csv. Downstream scripts had to be fixed to take this change into an account. Statistics looks good now. Only a handful of plates currently show similar drift, but that's probably due to plate-edge and similar anomalies.

Latest version is available: https://github.com/jannefi/vasco (with corrected stats in README.md).

Bonus: while waiting for test runs to complete, I documented the whole pipeline: https://github.com/jannefi/vasco/blob/main/WORKFLOW.md It covers all important scripts, parameters, environment variables, files and folders. And some common errors and caveats.
It's challenging that cdsskymatch or other Vizier calls sometimes fail due to RA/DEC values being outside PS1 or Gaia coverage. Those tiles have to be excluded from downstream scripts even if they are valid POSS-I tiles. There are not many of them yet, so it shouldn't have impact on final outcome, but I'll keep monitoring the situation.
 
Back
Top