Digitized Sky Survey POSS-1

boguesuser

Active Member
I'm splitting of into a new thread since I'll keep pulling Transients in the Palomar Observatory Sky Survey off topic if I continue there.

As a summary, I'm on a mission to find the raw plates from the Digitized Sky Survey which is (I think anyway) the digitized version of the Palomar Observatory Sky Survey. So far I've found a set of 102 cd's that can be found in about 22 libraries world wide and I've also just found What appears to be a repository of some of the plates (Under "Browseable Directories").

I'm going to try to scrape the plates from Caltech's website and see if I can find the plates that match the targets in the paper.

If anyone has suggestions for specific analysis to run, let me know and I'll see if I can implement it. I'll also try to make a github repository at some point too.
 
This tool makes it much easier to match plates
Candidate coordinates:
  1. RA: 21 02 52.28 DEC: +48 34 18.90
  2. RA: 03:05:42.48 DEC: +07:58:29.60
  3. RA: 03:08:27.13 DEC: +34:40:46.01
  4. RA: 21:24:39.71 DEC: +68:31:30.04
  5. RA: 19:16:45.76 DEC: +51:28:52.40
The relevant plates are as follows

Candidate 1:
  • POSS-1: Xx186
  • POSS-2: Xx235
Candidate 2:
  • POSS-1: Xx531
  • POSS-2: Xx760 & Xx688
Candidate 3:
  • POSS-1: Xx246
  • POSS-2: Xx357
Canidate 4:
  • POSS-1: Xx074 & Xx075
  • POSS-2: Xx075
Canidate 5:
  • POSS-1: Xx141 & Xx183
  • POSS-2: Xx231 & Xx232
 
A list of vanishing sources for POSS I is here. It wasn't accessible a few minutes ago, now it is again. I downloaded the list as a csv (attached) in case.

http://svocats.cab.inta-csic.es/vanish-possi/index.php?action=search



ESO has a batch tool you can use to download the images efficiently.

https://archive.eso.org/cms/tools-documentation/the-eso-st-ecf-digitized-sky-survey-application.html

This is the tool used for detecting sources.

https://sextractor.readthedocs.io/en/latest/Introduction.html

They describe their methodology here.

https://arxiv.org/abs/2206.00907

At least this might be helpful for getting started.
 

Attachments

A list of vanishing sources for POSS I is here. It wasn't accessible a few minutes ago, now it is again. I downloaded the list as a csv (attached) in case.
That gives ra/dec (the position in the celestial sphere, like a star position), but not the date/time. So it's not really useful for the shadow determination
 
That gives ra/dec (the position in the celestial sphere, like a star position), but not the date/time. So it's not really useful for the shadow determination
If you download the fits image, it has a header which gives the date of observation. You can use fv on linux to inspect it. sudo apt-get install ftools-fv

1755236043395.png
 
I did stumble on that one. I'd much rather use the raw plates however.

That looks like it will be a fairly useful tool. I'll have to see if I can figure out how to get it working.

At least this might be helpful for getting started.
That csv will be incredibly helpful.


That gives ra/dec (the position in the celestial sphere, like a star position), but not the date/time.
You can use the plate finder I posted earlier to find the time the plates were exposed.

There is also a repository of meta data and such here.
 
I did stumble on that one. I'd much rather use the raw plates however.

I guess maybe there are two versions, the original STScI digitizations that were packaged on CDs.

https://en.wikipedia.org/wiki/Digitized_Sky_Survey

And the same digitizations but processed and re-calibrated in support of Hubble Space Telescope operations.

The Catalogs and Surveys Branch of the Space Telescope Science Institute has digitized the photographic Sky Survey plates from the Palomar and UK Schmidt telescopes to produce the "Digitized Sky Survey". These images were then processed and calibrated to produce the Guide Star Catalogs in support of HST operations.

https://gsss.stsci.edu/Catalogs/Catalogs.htm

The data used by VASCO is the latter. Maybe it is worth digging deeper into the way the images were processed to produce the GSC?

The original version of this catalog was created to support the identification and use of Guide Stars for the pointing of the HST. It is based on the Palomar Quick-V survey in the northern hemisphere and the SERC-J survey in the south. This catalog contains objects in the magnitude range 7-16 and the classification was biased to prevent the use of a non-stellar object as a guide star.

https://gsss.stsci.edu/Catalogs/GSC/GSC1/GSC1.htm
 
If you download the fits image, it has a header which gives the date of observation. You can use fv on linux to inspect it. sudo apt-get install ftools-fv
Which fits image? For the first transient listed here, how do I get the date/time?

2025-08-14_22-52-15.jpg


I was looking at the FITS files here:
https://irsa.ipac.caltech.edu/data/DSS/images/dss1red/

I downloaded the first one, and it does have
Code:
DATE-OBS= '1952-08-21T10:35:00' / Observation: Date/Time

Which is a very different format from what you'd posted.
2025-08-14_22-58-13.jpg


Ultimately, I'm looking to get a CSV of ra, dec, and date/time (of the appearance of the transient), so I can plug it into sitrec and do a numerical verification of their shadow calculations
 
I'm not sure. I'm somewhat confused about dates. Is there only one date per-patch in the sky? And the date depends which patch you get, and then you just extract an image from EOS based on coords and look at the header to see the date?

For the first transient in the list, extracting and looking at the header on fv gives this. I notice that getting the DSS1 images using the coordinates, you see a source in the center that you don't see from the same coordinates in DSS2. So I assume you just have two times per frame, DSS1 and DSS2.

1755238419622.png
 
Last edited:
Update on "Earth's shadow as a filter"-theory: Villarroel et al. submitted a proof of concept paper on 4-Aug-2025: https://academic.oup.com/mnras/advance-article/doi/10.1093/mnras/staf1158/8221885 (titled: A Cost-Effective Search for Extraterrestrial Probes in the Solar System).

They used modern and fairly large dataset from Zwicky Transient Facility (ZTF). They examined more than 200,000 images, specifically focusing on those captured within Earth's shadow. Via phys org: https://phys.org/news/2025-08-scientists-earth-shadow-alien-probes.html
 
I was able to get the batch tool working. Long story short, I had to install tcsh and you need to convert the coordinates from decimal to hh mm ss. I made the batch input file like this.

Python:
import csv
import os
from pathlib import Path
from astropy.coordinates import Angle
import astropy.units as u

def main():

    input_csv_file   = "vanish_possi_1755228340.csv"
    batch_input_file = "dss_batch_input.txt"
    image_size_x     = 1.924
    image_size_y     = 1.547
 
    batch_lines = []
 
    with open(input_csv_file, mode='r', newline='') as csvfile:
        reader = csv.reader(csvfile)
        header = next(reader)
        for i, row in enumerate(reader):

            ra_decimal  = float(row[0]) * u.degree
            dec_decimal = float(row[1]) * u.degree
      
            # Convert the RA and Dec to sexagesimal strings
            ra_sexagesimal  = Angle(ra_decimal).to_string(  unit=u.hour, sep=' ', precision=2     )
            dec_sexagesimal = Angle(dec_decimal).to_string( sep=' ', precision=2, alwayssign=True )

            image_name = f"POSSI_VS_{str(i+1).zfill(7)}_"
      
            line = f"{image_name} {ra_sexagesimal} {dec_sexagesimal} {image_size_x} {image_size_y}"
            batch_lines.append(line)

    # Write the formatted lines to the batch input file
    with open(batch_input_file, "w") as f:
        f.write("\n".join(batch_lines))

if __name__ == "__main__":
    main()

Then, after creating the batch file.

Code:
dss1 -i dss_batch_input.txt

Then you'll have all of the fits files, and you can create a csv file with the needed info from the headers like this.

Python:
import csv
import os
import glob
from astropy.io import fits

def get_metadata():
    output_csv_file = "extracted_data.csv"
    metadata_list = []
 
    fits_files = glob.glob('*.fits')
 
    for fits_file in fits_files:
        with fits.open(fits_file) as hdul:
            header = hdul[0].header
            metadata = {
                "file_name" : os.path.basename(fits_file),
                "DATE-OBS"  : header["DATE-OBS"],
                "UT"        : header["UT"],
                "SITELAT"   : header["SITELAT"],
                "SITELONG"  : header["SITELONG"],
                "OBJCTRA"  : header["OBJCTRA"],
                "OBJCTDEC" : header["OBJCTDEC"],
                "EQUINOX"   : header["EQUINOX"],
               "EPOCH" : header["EPOCH"]
            }
            metadata_list.append(metadata)
     
    with open(output_csv_file, 'w', newline='') as csvfile:
        fieldnames = list(metadata_list[0].keys())
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(metadata_list)

if __name__ == "__main__":
    get_metadata()
 
Last edited:
He's getting the snippits of the plates where the transients have been reported.

I'm targeting the full plate so I can do some fun data stuffs. Each blue plate is about 1gb while each red is 325mb.
Thanks for explaining. I am similarly hoping for the full plate. I am an actuary/data scientist so I would love to recreate their analysis first using their questionable methodology, and then do it again but with a methodology I would consider "correct". I'll need the full plates as well.
 
Sorry just saw this thread. @beku-mant does your code successfully get the data that @boguesuser is trying to get in post #1?

If so, can you upload the data somewhere convenient?

I was able to get fits images for the 5399 sources listed in the link below. I only extracted small images of the sources, and in total its only about 150 MB, but contains all of the info needed to check if they are in Earth's shadow. And some extracted fields are in the csv file.

http://svocats.cab.inta-csic.es/vanish-possi/index.php?action=search

Although I am a little concerned about making a mistake. So don't take the data as correct without further validation.
 

Attachments

Thanks for explaining. I am similarly hoping for the full plate. I am an actuary/data scientist so I would love to recreate their analysis first using their questionable methodology, and then do it again but with a methodology I would consider "correct". I'll need the full plates as well.
This is the most complete plate database I've found online.

I have some AI generated rust code to download them automatically. I'll add it here when I get a chance.

I am an actuary/data scientist
I have a feeling you'll have a better shot at this than I will lol.
 
Thanks for explaining. I am similarly hoping for the full plate. I am an actuary/data scientist so I would love to recreate their analysis first using their questionable methodology, and then do it again but with a methodology I would consider "correct". I'll need the full plates as well.
Here is the rust code. Its pretty rough but it works.

Main.rs
Rust:
mod downloader;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    println!("Starting file download and processing example...");

    // TODO: Move the main folder and base url here
    // TODO: Download an image to /tmp, extract the blob map, remove from disk
        // See TODO file for more detail

    // --- Download Blue Images ---
    for i in 1..=871 { // Assuming 871 is the max for blue as well, adjust if needed
        let image_id = format!("XO{:03}", i);
        let download_url = format!("https://irsa.ipac.caltech.edu/data/DSS/images/dss1blue/dss1blue_{}.fits", image_id);
        let download_dest = format!("poss_1_raw/poss_blue_raw/dss1blue_{}.fits", image_id);

        match downloader::download_file(&download_url, &download_dest).await {
            Ok(_) => {
                println!("Blue image download process completed (downloaded or already exists) for {}", download_dest);
            }
            Err(e) => eprintln!("Error during blue image download process for {}: {}", download_url, e),
        }
        println!(""); // Add blank line after each blue image download attempt
    }

    // --- Download Red Images ---
    for i in 1..=871 { // Assuming 871 is the max for red as well, adjust if needed
        let image_id = format!("XE{:03}", i);
        let download_url = format!("https://irsa.ipac.caltech.edu/data/DSS/images/dss1red/dss1red_{}.fits", image_id);
        let download_dest = format!("poss_1_raw/poss_red_raw/dss1red_{}.fits", image_id);

        match downloader::download_file(&download_url, &download_dest).await {
            Ok(_) => {
                println!("Red image download process completed (downloaded or already exists) for {}", download_dest);
            }
            Err(e) => eprintln!("Error during red image download process for {}: {}", download_url, e),
        }
        println!(""); // Add blank line after each red image download attempt
    }

    println!("\nExample finished.");
    Ok(())
}

downloader.rs
Rust:
use std::path::Path;
use tokio::io::AsyncWriteExt;
use reqwest::StatusCode;
use indicatif::{ProgressBar, ProgressStyle};
use futures_util::stream::StreamExt;
// src/downloader.rs
// Using smol for async operations.
// For actual implementation, consider adding dependencies like `reqwest` for downloading
// and `smol` for async runtime
/// Downloads a file from a given URL and saves it to a specified path.
///
/// # Arguments
///
/// * `url` - The URL of the file to download.
/// * `dest_path` - The local path where the file should be saved.
///
/// # Returns
///
/// A `Result` indicating success or failure.
pub async fn download_file(url: &str, dest_path: &str) -> Result<(), Box<dyn std::error::Error>> {
    let path = Path::new(dest_path);
    if let Some(parent) = path.parent() {
        tokio::fs::create_dir_all(parent).await?;
    }
    // Check if the destination file already exists using tokio::fs
    match tokio::fs::metadata(dest_path).await {
        Ok(metadata) => {
            if metadata.is_file() {
                println!("File already exists at {}. Skipping download.", dest_path);
                return Ok(());
            }
        }
        Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
            // File does not exist, proceed with download
        }
        Err(e) => {
            // Other error checking metadata
            eprintln!("Error checking file metadata for {}: {}", dest_path, e);
            return Err(Box::new(e));
        }
    }
    let response = reqwest::get(url).await?;
    if response.status() == StatusCode::NOT_FOUND {
        return Err(format!("File not found at URL: {}", url).into());
    }
    let total_size = response.content_length().unwrap_or(0);
    let pb = ProgressBar::new(total_size);
    pb.set_style(ProgressStyle::default_bar()
        .template("{spinner:.green} [{elapsed_precise}] [{bar:40.cyan/blue}] {bytes}/{total_bytes} ({eta}) {msg}")
        .unwrap()
        .progress_chars("#>-"));
    pb.set_message(format!("Downloading {}", url));
    let mut file = tokio::fs::File::create(dest_path).await?;
    let mut stream = response.bytes_stream();
    while let Some(chunk_result) = stream.next().await {
        let chunk = chunk_result?;
        file.write_all(&chunk).await?;
        pb.inc(chunk.len() as u64);
    }
    pb.finish_with_message(format!("Downloaded {}", url));
    Ok(())
}

Cargo.toml
Rust:
[package]
name = "poss_transient_detection"
version = "0.1.0"
edition = "2024"

[dependencies]
futures-util = "0.3.31"
indicatif = "0.18.0"
reqwest = { version = "0.12.23", features = ["stream"] }
tokio = { version = "1.47.1", features = ["full"] }

File structure
/base_directory

├── Cargo.toml

└── src
├── main.rs
└── downloader.rs
 
Last edited by a moderator:
1755287762416.png

From the 5399 sources. I get 39 In Shadow according to this code.

Python:
import csv
import os
import glob
from astropy.io import fits
from astropy.time import Time
from earthshadow import get_shadow_center, get_shadow_radius, dist_from_shadow_center
from astropy.coordinates import Angle
import astropy.units as u
from astropy.coordinates import SkyCoord

import matplotlib.pyplot as plt
import numpy as np

def fits_time( date_obs, ut ) :
    date_split = date_obs.split( "/" )
    return f"19{date_split[2]}-{date_split[1]}-{date_split[0]}T{ut}"

def in_shadow( ra_value, dec_value, time ) :

    center = get_shadow_center( time, obs='Palomar', orbit='GEO')
    radius = get_shadow_radius(orbit='GEO')
    dist   = dist_from_shadow_center( ra_value, dec_value, time=time, obs='Palomar', orbit='GEO')
    return dist < radius - 2*u.deg

def get_metadata():
    output_csv_file = "extracted_data.csv"
    metadata_list = []
    
    fits_files = glob.glob('fits_files/*.fits')
    
    for fits_file in fits_files:

        with fits.open(fits_file) as hdul:

            header = hdul[0].header

            c = SkyCoord( header["OBJCTRA"], header["OBJCTDEC"], unit=(u.hourangle, u.deg), frame='icrs')
            
            ra_deg  = c.ra.degree
            dec_deg = c.dec.degree

            metadata = {
                "file_name" : os.path.basename(fits_file),
                "DATE-OBS"  : header["DATE-OBS"],
                "UT"        : header["UT"],
                "SITELAT"   : header["SITELAT"],
                "SITELONG"  : header["SITELONG"],
                "OBJCTRA"   : header["OBJCTRA"],
                "OBJCTDEC"  : header["OBJCTDEC"],
                "EQUINOX"   : header["EQUINOX"],
                "EPOCH"     : header["EPOCH"],
                "EXPOSURE"  : header["EXPOSURE"],
                "SHADOW"   : in_shadow( 
                    ra_deg, 
                    dec_deg, 
                    time=Time( fits_time( header["DATE-OBS"], header["UT"] ), format='fits') )[0]
            }

            metadata_list.append( metadata )

    with open(output_csv_file, 'w', newline='') as csvfile:
        fieldnames = list(metadata_list[0].keys())
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(metadata_list)

if __name__ == "__main__":
    get_metadata()

I am not sure how to figure out how many are expected to be accounting for location bias in the POSSI data, and fraction of the sky in the shadow etc.
 
Since we don't know the height of these 'objects', we don't know how big the shadow is at that height, so the 'in shadow' determination seems meaningless. Are you assuming they are in LEO?
 
Since we don't know the height of these 'objects', we don't know how big the shadow is at that height, so the 'in shadow' determination seems meaningless. Are you assuming they are in LEO?
This code assumes they are in GEO, which is what they do in the paper I think (or rather tests whether the sources would be in shadow if they were in GEO). I used this code, that HoaxEye found referenced in her new paper.

https://github.com/guynir42/earthshadow
 
Last edited:
This code assumes they are in GEO, which is what they do in the paper I think (or rather tests whether the sources would be in shadow if they were in GEO).
Ahh, I see. As noted elsewhere, the shadowed area at GEO is incredibly small, and even smaller if the satellites are assumed to be in equatorial orbit as well.
fig-5a.gif
 
Are you on github? If not, I highly recommend signing up there and creating a repo for your code there. That will make collaboration easier.
I have technically used github before. I'm not terribly familiar with it however.
I do plan to set up a repo once I get my image processing started.
 
As noted elsewhere, the shadowed area at GEO is incredibly small, and even smaller if the satellites are assumed to be in equatorial orbit as well.
It's not incredibly small relative to LEO in terms of area. It's small as a percentage of the total area of the sphere at that distance.

This graph should show how the area shrinks in absolute terms (orange dashed) vs. relative to hemisphere area (blue). It's ChatGPT, but seems correct.

output (11).png


At GEO it's 1% of the hemisphere, 0.5% of the total sky.
From the 5399 sources. I get 39 In Shadow according to this code.
or 0.7%, so a surplus rather than a defict.

But they say:

External Quote:



To independently verify the number of transients located within Earth's shadow, we implemented a custom code
(using ChatGPT-assisted scripting) that follows a similar principles to EarthShadow. After validating its performance
on a subset of candidates from Villarroel et al. (2025), we applied it to the full sample. The resulting counts — 374
transients at 42,164 km and 57 at 80,000 km — are in good agreement with the results obtained using EarthShadow,
supporting the robustness of our shadow deficit measurement.

To estimate the statistical significance of the difference in transient detection rates within Earth's umbra at different421
altitudes, we compute Poisson uncertainties for the observed and expected fractions. At 42,164 km altitude, we expect

N = 1223 transients in shadow out of 106,339 total, corresponding to an expected fraction of fexp = 0.0115 ±0.00033.
However, we observe only N = 349 transients in shadow, yielding fobs = 0.00328 ±0.00018. The difference between
these fractions is highly significant, with a significance level of 21.9σ, computed by combining the Poisson uncertainties
1223/106339 is 1.15%
349/106339 is 0.33%

Oh, they use geocentric radius (a sphere defined by the radius from the center of the earth), not altitude (a sphere defined by it's altitude above a nominally spherical Earth). So 42,164 is GEO.
 
I'm also not entirely clear why they use percentages of a hemisphere and not percentages of the full sky.
 
I'm also not entirely clear why they use percentages of a hemisphere and not percentages of the full sky.

They say both:
External Quote:

At 42,164 km altitude, we expect
N = 1223 transients in shadow out of 106,339 total, corresponding to an expected fraction of fexp = 0.0115 ±0.00033.
But then a few lines later:
External Quote:

Out of the 114 300 simulated points (180 points per plate), 610 were found to lie within Earth's shadow, implying
that approximately 0.53% of the survey area should be shadowed at GSO.
The second one seems to correctly use the percentage of the sky (derived numerically), while the first (with the super high sigma) uses the percentage of a hemisphere (derived analytically). Is this an error?
 
ChatGPT had some problems with EarthShadow - one being that it thinks they model the shadow as a cylinder, not a cone. I asked it what the difference would be and it came back with 1.150% of a hemisphere, which is exactly what they have. So that, at least is incorrect.

View attachment 83187
This is the function that earthshadow uses.

Python:
def get_shadow_radius(orbit=None, geocentric_orbit=True, geocentric_angle=True):
    """
    Get the angle of the radius of Earth's shadow,
    where it intercepts the sky at an orbital radius,
    as seen by an observer at the center of the Earth.

    This is the geometric shadow, and does not include
    partial shadowing by the atmosphere.
    The atmosphere allows bending of some light to reach
    an angle of about 1-2 degrees into the geometric shadow.

    When inputting the orbit as a float (assume km) or a Quantity
    it is assumed this value includes the radius of the Earth.
    To give the orbit height above the Earth's surface,
    specify geocentric=False.

    Parameters
    ----------
    orbit: float or astropy.units.Quantity
        The orbital radius of the satellite.
        This is measured from the center of the Earth
        (e.g., LEO would be 200 + 6371 = 6571 km).
        If given as float assume kilometers.
        Defaults to 42164 km (geosynchronous orbit).
    geocentric_orbit: bool
        If True, assume the orbit is given as the
        distance from the center of the Earth (default).
        If False, assume the orbit is given as the
        distance above the Earth's surface.

    Returns
    -------
    angle: astropy.units.Quantity
        The angle of the Earth shadow.
    """
    orbit = interpret_orbit(orbit, geocentric_orbit=geocentric_orbit)

    if orbit < EARTH_RADIUS:
        raise ValueError(
            f"Orbit radius {orbit} is below Earth radius {EARTH_RADIUS}. "
            "If you intend to give the orbit height above the Earth surface, "
            "set geocentric_orbit=False."
        )

    angle = np.arcsin(EARTH_RADIUS / orbit).to(u.deg)

    if not geocentric_angle:
        angle = geocentric_to_topocentric_angle(angle, orbit=orbit)

    return angle

https://github.com/guynir42/earthshadow/blob/main/src/earthshadow.py

I'm honestly not sure how to appropriately evaluate the data at this point yet.
 
The second one seems to correctly use the percentage of the sky (derived numerically), while the first (with the super high sigma) uses the percentage of a hemisphere (derived analytically). Is this an error?
I think the second one is more correct. The expected number would not be a simple percentage of the whole sky or a hemisphere, unless your images were an even sampling of the entire (or half) sky at all times. What we have is a number of plates that cover the sky at different times. So the expected fraction in shadow can only be calculated with reference to those plates, which is what they do. But it's not clear exactly what is being correlated, or how well.
 
Back
Top