Digitized Sky Survey POSS-1

I mean according to Solano et al. 2022, they did identify interesting phenomena. The last sentence of the abstract is the bit about the lack of vanishing stars I think.

External Quote:
In this paper, we report a search for vanishing sources in POSS I red images using virtual observatory (VO) archives, tools, and services. The search, conducted in the framework of the VASCO project, aims at finding POSS I (red) sources not present in recent catalogues like Pan-STARRS DR2 (limiting magnitude r = 21.4) or Gaia EDR3 (limiting magnitude G = 21). We found 298 165 sources visible only in POSS I plates, out of which 288 770 had a cross-match within 5 arcsec in other archives (mainly in the infrared), 189 were classified as asteroids, 35 as variable objects, 3592 as artefacts from the comparison to a second digitization (Supercosmos), and 180 as high proper motion objects without information on proper motion in Gaia EDR3. The remaining unidentified transients (5399) as well as the 172 163 sources not detected in the optical but identified in the infrared regime are available from a VO compliant archive and can be of interest in searches for strong M-dwarf flares, high-redshift supernovae, asteroids, or other categories of unidentified red transients. No point sources were detected by both POSS-I and POSS-II before vanishing, setting the rate of failed supernovae in the Milky Way during 70 yr to less than one in one billion.
Agreed. MNRAS 2022 is interesting project, and that's why I started my project in the first place. Too bad I had to do this: https://github.com/jannefi/vasco/releases/tag/v.0.9.5-legacy-mnras-repro - I archived the repro work as a "historical" release. But I can continue the work with changed goals: https://github.com/jannefi/vasco/blob/main/README.md (published about 30 minutes ago).

There are some well known projects in this field like the Digital Access to a Sky Century @ Harvard (DASCH) project.
Link: https://dasch.cfa.harvard.edu/

External Quote:
DASCH was the project to digitize the Harvard College Observatory's Astronomical Photographic Glass Plate Collection for scientific applications. This enormous — multi-decade — undertaking was completed in 2024. Its legacy is DASCH Data Release 7, an extraordinary dataset that enables scientific study of the entire night sky on 100-year timescales.
Research involved investigating transient-like phenomena on early plates that could be caused by either rare astrophysical events or instrumental noise. E.g. "Extreme Transients Discovered with DASCH: New BH-LMXBs or new Classical Novae?" https://scixplorer.org/abs/2019AAS...23311206G/abstract (Grindlay, 2019).

This is a very interesting research area. There are challenges related to old glass plate images and producing good digital datasets from them, but that doesn't mean they cannot be studied. One "ghost image" is discussed here: https://dasch.cfa.harvard.edu/pipeline-overview/

External Quote:
The image at the left (from plate I31090) shows the center of M44 taken in 1903 and illustrates a special analysis problem presented by old photometric techniques. Note that the triangle on the right has a ghost image on the left generated by a "Pickering Wedge". This wedge was placed on the telescope objective in an attempt to extend the dynamic range of film. When the primary star image is saturated, a researcher can measure the ghost images and add a known wedge magnitude offset.
 
Agreed. MNRAS 2022 is interesting project, and that's why I started my project in the first place. Too bad I had to do this: https://github.com/jannefi/vasco/releases/tag/v.0.9.5-legacy-mnras-repro - I archived the repro work as a "historical" release. But I can continue the work with changed goals: https://github.com/jannefi/vasco/blob/main/README.md (published about 30 minutes ago).
Good choice! Plate awareness facilitates more meaningful statistical analyses.
 
About The Minnesota Automated Plate Scanner (MAPS) Catalog discussed by Watters et al. It's a catalogue derived from the first epoch of Palomar Sky Survey (POSS-I). It contains over 89 million matched stellar objects (stars and galaxies). There's a problem: it has been retired from Vizier. I couldn't find it from any online catalog search provider either.

But the data can be found from this website: https://aps.umn.edu/ Although it covers almost 90 million objects, it's not huge, around 13 gigabytes. I decided to make a local mirror and perhaps run some tests with it. Here's one simple download script: https://github.com/jannefi/vasco/blob/main/scripts/maps_mirror.sh - it uses aria2c for downloads.

The data is in proprietary format. Luckily there is a good enough description of it on their website. They talk about C and Perl source code examples, but those are not on their website - and never has been (according to archive.org). I use Python and Parquet a lot, so here's my Python converter: https://github.com/jannefi/vasco/blob/main/scripts/maps_decode_to_parquet.py Additional script for adding ICRS coordinates based on original FK4 B1950: https://github.com/jannefi/vasco/blob/main/scripts/maps_b1950_to_icrs.py

All this MAPS stuff, and the fact I'm facing big design and code changes, made me think about the STScI coordinates: does the FITS downloader service return good enough coordinates for 30x30 arcmin tiles? It actually does, for the purposes of source extractor. Based on FITS headers, they declare ICRS with EQUINOX 2000. Not B1950 or similar. However, tile WCS is a linear TAN approximation anchored at CRPIX/CRVAL It's a standard, but it's an approximation over a finite field of view. AMDX/AMDY are present, indicating there is a richer, non-linear astrometric model for the tile geometry that may provide better accuracy than the linear WCS alone.

But I can't test how accurate STScI tile coordinates are. This is why I've been using "wcs fixed" coordinates in the software, but only in downstream scripts, not in the pipeline.

The scripts doesn't actually "fix" anything, I just call it "wcs fixing". In reality it's a Gaia-tied polynomial correction of catalog RA/Dec, which is a very good coordinate correction method provided that you get enough Gaia matches. It's not time-dependent, which makes it even more suitable for this project.

Because I'm now free to explore, I made some tests with "wcs fixed" vs. original coordinates against latest Gaia and PS1 datasets. Difference was huge. I couldn't believe the numbers so I made a test set consisting of 60K rows of data from 200 randomly selected tiles. The confusion matrix results agreed, and today I made additional false-match tests. Details here: https://github.com/jannefi/vasco/wiki/Confusion‐table-summary

In the future, I will use canonical coordinates only. Together with newer optical catalogs like PS1/Gaia, the match rate will increase dramatically even at 2" arcsec (current MNRAS 2022 defined radius is 5" arcsec).

I also decided to store the test files in case anyone wants to re-test or check. Decision is to use 2" arcsec search radius with canonical / Gaia-tied coordinates. How to implement this in the pipeline... I don't know yet. It seems there's always something that needs be implemented and tested.
 
Last edited:
Update: I'm closing the current Vasco project. https://github.com/jannefi/vasco/ - because I've already started implementing a new one "Vasco60" (name comes from new tile size: 60x60")

My last "Vasco30" stats after a full run. Tiles: 11,774. Potential sources after source extractor pass 2: 18,38 million (this includes some duplicates). Reductions: Gaia and PS1 within 5", MNRAS 2022 specified filters, tiles near plate edges, and duplicate removal: R=71,624. Post-pipeline reductions:
- Stage 0: final PS1 veto (bug in pipeline caused problems to early PS1 veto), removal of tiles in Southern Hemisphere. R = 14,5K
- Stage 1: Skybot. Reduction was few hundred, but the whole run took 36 hours...
- Stage 2: SuperCosmos. Again a minor reduction
- Stage 3: PTF, around 700 matches, R = roughly 12K
- Stage 4: VSX, minor reduction again, about 100

Final R = 11,848

I did experimental reduction rounds with both USNO-B and locally mirrored MAPS catalog. This step would drop about 4K rows, but I have no clue if I should even report this :p

It's clear that main reductions come from two optical catalogs: PS1 and Gaia. Rest of the steps are needed, but their impact on R is small.

Few days ago I was desperately trying to compare my R to MNRAS 2022 R (5,399). I've now understood that these two numbers cannot be, and should not be compared. MNRAS 2022 R is full of unknowns, meaning it is not fully clear how exactly they ended up with that dataset. This was one of the topics discussed by Watters et al, and also Villarroel et al in their response. I will simply stop comparing. My small dataset does have around 170 tiles with matching coordinates in MNRAS 2022 R (with some duplicates), in total little bit over 100 matching coordinates. And much more in the bigger dataset they published.

I took some time and checked few coordinates in MNRAS R set that did not end up in my R. I wanted to understand why they were not in my dataset: I had a corresponding tile. I downloaded separately a 60x60 arcmin tile with center coords: RA = 130.013°, Dec = +33.081° and started going through each pipeline step. All 7 candidate coordinates can be found from that tile. In the raw source extractor pass2 catalogue for this tile, 6/7 R points have a detection within 5″, one was "cut" by pass 1 most likely. It was bit surprising to see what cut the remaining six: the quality gates defined in the MNRAS 2022. Most of them failed several gates. So how come they are in MNRAS 2022 R? I don't know. I made a full description of test steps, quality gate failure matrix and sent that to some of my "paper contacts", but crucially one of them has not replied to any of my e-mails. Lead author has been apparently too busy with other affairs. Note: this was a very small dataset so all this is speculative - except the reason why those six candidates didn't pass my pipeline.

Like said, I have started implementing Vasco60. Repository exist but it's currently private. Code is about 95% complete. I will make it public once it's ready for production. There will be some other big changes besides 60x60" tiles. Each tile will face a 30 arcmin circular "radius cut" like in MNRAS 2022. No more random downloads either. "Smart" tessellation plan will be made in form of JSON file, and the pipeline follows that plan. All stages and especially reductions will be separately reported per tile, per full dataset, and by each downstream step. No "main Parquets", no infrared catalogs, mainly lots of CSV files with lots of numbers. Southern Hemisphere will be excluded at tessellation plan phase. Probably all tiles that are not in PS1 coverage will be excluded, too. I might add the Earth's shadow statistics, but I'm bit hesitant: perhaps that topic has been sufficiently covered already.

I will continue running the new Vasco60 pipeline until I run out of disk space
 
Last edited:

Trending content

Back
Top