"Data Scraping" is extracting data from a web page, either from the HTML of the web page itself, or from the data sources used by that web page.
The New York times election result pages have easily readable data sources, including time-stamped state-of-the race data, and so have been used by independent analysts to look at the trends as the results came in.
One claim is that votes are being "swapped", the claim rests on data like this:
Here are two data points, at 04:07:43Z and 04:08:51Z, the fraction of the vote shared is given, to three decimal places, and the exact vote is given. This was used to calculate how many votes they each had at both points, and then the change between then.
Biden:
2984468 * 0.42 = 1253477
2984522 * 0.426 = 1271406
1271406 - 1253477 = 17930
Trump
2984468 * 0.566 = 1689209
2984522 * 0.56 = 1671332
1671332 - 1689209 = -17877
So assuming the data is accurate, then at this one point in time, Trump's vote went down nearly 18K, and Biden's went up 18K, essentially taking 18K votes away from Biden and giving them to Trump.
The anonymous person in the link above wrote some code to calculate all the times Trump when down when Biden when up, and to add together all these declines. The result was a loss of 220,833 votes for trump (all numbers are with the data I downloaded today)
The immediate problem here is that it's ignoring losses for Biden when Trump gained votes. If we calculate those, we have a loss of 25,712 for Biden, not really changing things.
But what if we add up ANY decline? After all a decrease in your votes is an anomaly, it does not really matter if it happens simultaneously with a rise in your opponent's votes. If we add up ALL the declines, we have: Trump Lost = -408547 and Biden Lost = -637917. So adjustment down hurt Biden more than Trump.
But I said earlier assuming the data is accurate - now clearly if there are negative votes then something it going wrong. Let's take a step back, and look at the bigger picture. If we extract all the calculated vote totals and graph them, it looks like this:
Blue is Biden, he starts out rapidly increasing to about 450K, then there's a sudden jump to around 700K and a more gradual rise, then a correction down, then another sharp spike and correction that's mirrored with a similar, but smaller drop/spike/correction for Trump. After that things settle down.
So clearly there's some bad data there, and errors that were corrected. The worst of it happens in the first 60 data points. If we exclude those and just look at declines after that we have Trump Lost = -93091 and Biden Lost = -24619.
The problem with this as evidence of election fraud is that it does not really make any sense. Why would you so blatantly subtract votes if the whole point was not to get caught? Where are these subtractions supposedly happening in the counting process? What happens if there's a recount?
The data here is from the NYT. They get data from Edison Research:
https://www.edisonresearch.com/election-polling/
The data we see is formatted for display purposes. The raw data would have actual numbers, not a low-resolution share of the vote. Actual results are tallies at a county level, different outlets report the county results continually, and changes should be identifiable, as they would stand out at the county level. Unfortunately, all of this is hidden in the simplistic and possibly buggy dataset from the NYT.
Ultimately the problem here is a lack of information about where the numbers a coming from, and what factors into them. How often do counties issue a correction? What actually happened at the various points on this graph? The people collecting this data have the capability to explain this issue (and quite possibly already have). While it may not seem important, things like things become the foundation of long-lasting conspiracy theories, and really needs to be addressed to prevent it from creating harm in the future.
Here's a short bash script that can be used to download the NYT data.
and a modified version of the script used to tally the subtractions
This is modified to run from the command line with a argument, like:
python fraudcatch.py Race-Pres-pennsylvania
I've attached a zip file containing the data as of around 9:30AM PST.
There are two data files for each states, a "race" file, and a "state" file. The race file mostly refers to the Presidential race, and is a smaller file. The state file includes all the data in the race file and adds considerably more data regarding the various state races.
The New York times election result pages have easily readable data sources, including time-stamped state-of-the race data, and so have been used by independent analysts to look at the trends as the results came in.
One claim is that votes are being "swapped", the claim rests on data like this:
Code:
{
"eevp": 42,
"eevp_source": "edison",
"timestamp": "2020-11-04T04:07:43Z",
"vote_shares": {
"bidenj": 0.42,
"trumpd": 0.566
},
"votes": 2984468
},
{
"eevp": 42,
"eevp_source": "edison",
"timestamp": "2020-11-04T04:08:51Z",
"vote_shares": {
"bidenj": 0.426,
"trumpd": 0.56
},
"votes": 2984522
},
Biden:
2984468 * 0.42 = 1253477
2984522 * 0.426 = 1271406
1271406 - 1253477 = 17930
Trump
2984468 * 0.566 = 1689209
2984522 * 0.56 = 1671332
1671332 - 1689209 = -17877
So assuming the data is accurate, then at this one point in time, Trump's vote went down nearly 18K, and Biden's went up 18K, essentially taking 18K votes away from Biden and giving them to Trump.
The anonymous person in the link above wrote some code to calculate all the times Trump when down when Biden when up, and to add together all these declines. The result was a loss of 220,833 votes for trump (all numbers are with the data I downloaded today)
The immediate problem here is that it's ignoring losses for Biden when Trump gained votes. If we calculate those, we have a loss of 25,712 for Biden, not really changing things.
But what if we add up ANY decline? After all a decrease in your votes is an anomaly, it does not really matter if it happens simultaneously with a rise in your opponent's votes. If we add up ALL the declines, we have: Trump Lost = -408547 and Biden Lost = -637917. So adjustment down hurt Biden more than Trump.
But I said earlier assuming the data is accurate - now clearly if there are negative votes then something it going wrong. Let's take a step back, and look at the bigger picture. If we extract all the calculated vote totals and graph them, it looks like this:
Blue is Biden, he starts out rapidly increasing to about 450K, then there's a sudden jump to around 700K and a more gradual rise, then a correction down, then another sharp spike and correction that's mirrored with a similar, but smaller drop/spike/correction for Trump. After that things settle down.
So clearly there's some bad data there, and errors that were corrected. The worst of it happens in the first 60 data points. If we exclude those and just look at declines after that we have Trump Lost = -93091 and Biden Lost = -24619.
The problem with this as evidence of election fraud is that it does not really make any sense. Why would you so blatantly subtract votes if the whole point was not to get caught? Where are these subtractions supposedly happening in the counting process? What happens if there's a recount?
The data here is from the NYT. They get data from Edison Research:
https://www.edisonresearch.com/election-polling/
The data we see is formatted for display purposes. The raw data would have actual numbers, not a low-resolution share of the vote. Actual results are tallies at a county level, different outlets report the county results continually, and changes should be identifiable, as they would stand out at the county level. Unfortunately, all of this is hidden in the simplistic and possibly buggy dataset from the NYT.
Ultimately the problem here is a lack of information about where the numbers a coming from, and what factors into them. How often do counties issue a correction? What actually happened at the various points on this graph? The people collecting this data have the capability to explain this issue (and quite possibly already have). While it may not seem important, things like things become the foundation of long-lasting conspiracy theories, and really needs to be addressed to prevent it from creating harm in the future.
Here's a short bash script that can be used to download the NYT data.
Code:
## These are the names as used in the NYT API. 50 US states
declare -a names=("alabama" "alaska" "arizona" "arkansas" "california" "colorado" "connecticut" "delaware" "florida" "georgia" "hawaii" "idaho" "illinois" "indiana" "iowa" "kansas" "kentucky" "louisiana" "maine" "maryland" "massachusetts" "michigan" "minnesota" "mississippi" "missouri" "montana" "nebraska" "nevada" "new-hampshire" "new-jersey" "new-mexico" "new-york" "north-carolina" "north-dakota" "ohio" "oklahoma" "oregon" "pennsylvania" "rhode-island" "south-carolina" "south-dakota" "tennessee" "texas" "utah" "vermont" "virginia" "washington" "west-virginia" "wisconsin" "wyoming")
l=${#names[@]}
for ((i=0; i<${l};i++));
do
echo $i " - " ${names[$i]}
wget -nc https://static01.nyt.com/elections-assets/2020/data/api/2020-11-03/race-page/${names[$i]}/president.json -O Race-Pres-${names[$i]}.json
wget -nc https://static01.nyt.com/elections-assets/2020/data/api/2020-11-03/state-page/${names[$i]}.json -O State-Pres-${names[$i]}.json
cat Race-Pres-${names[$i]}.json | python -m json.tool > Race-Pres-${names[$i]}-PP.json
cat State-Pres-${names[$i]}.json | python -m json.tool > State-Pres-${names[$i]}-PP.json
done
and a modified version of the script used to tally the subtractions
Code:
import json
import sys
##print(f"Name of the script : {sys.argv[0]=}")
##print(f"Arguments of the script : {sys.argv[1:]=}")
def findfraud(NAME):
with open(NAME + '.json', encoding="utf8") as f:
x = json.load(f)
TotalVotesLost = 0
for i in range(len(x["data"]["races"][0]["timeseries"])):
if i != 0 and x["data"]["races"][0]["timeseries"][i]["votes"] * x["data"]["races"][0]["timeseries"][i]["vote_shares"]["trumpd"] < x["data"]["races"][0]["timeseries"][i-1]["votes"] * x["data"]["races"][0]["timeseries"][i-1]["vote_shares"]["trumpd"]:
if x["data"]["races"][0]["timeseries"][i]["votes"] * x["data"]["races"][0]["timeseries"][i]["vote_shares"]["bidenj"] > x["data"]["races"][0]["timeseries"][i-1]["votes"] * x["data"]["races"][0]["timeseries"][i-1]["vote_shares"]["bidenj"]:
print ("Index : " + str(i) + " Past Index : " + str(i-1))
print (x["data"]["races"][0]["timeseries"][i]["votes"] * x["data"]["races"][0]["timeseries"][i]["vote_shares"]["trumpd"] - x["data"]["races"][0]["timeseries"][i-1]["votes"] * x["data"]["races"][0]["timeseries"][i-1]["vote_shares"]["trumpd"])
TotalVotesLost += x["data"]["races"][0]["timeseries"][i]["votes"] * x["data"]["races"][0]["timeseries"][i]["vote_shares"]["trumpd"] - x["data"]["races"][0]["timeseries"][i-1]["votes"] * x["data"]["races"][0]["timeseries"][i-1]["vote_shares"]["trumpd"]
print (str(str(TotalVotesLost) + " Flo"))
def ff1(NAME):
with open(NAME + '.json', encoding="utf8") as f:
x = json.load(f)
TrumpVotesLost = 0
BidenVotesLost = 0
for i in range(len(x["data"]["races"][0]["timeseries"])):
prev = x["data"]["races"][0]["timeseries"][i-1]
curr = x["data"]["races"][0]["timeseries"][i]
prevBiden = prev["votes"] * prev["vote_shares"]["bidenj"]
prevTrump = prev["votes"] * prev["vote_shares"]["trumpd"]
currBiden = curr["votes"] * curr["vote_shares"]["bidenj"]
currTrump = curr["votes"] * curr["vote_shares"]["trumpd"]
if i > 0:
if currTrump < prevTrump and currBiden < prevBiden:
print (curr["timestamp"] + ": BOTH Loss: Trump " + str(int(currTrump - prevTrump))+ " Biden: "+str(int(currBiden-prevBiden)))
TrumpVotesLost += int(currTrump - prevTrump)
BidenVotesLost += int(currBiden - prevBiden)
else:
if currTrump < prevTrump: # and currBiden > prevBiden:
print (curr["timestamp"] + ": TRUMP Loss: " + str(int(currTrump - prevTrump))+ " Biden: "+str(int(currBiden-prevBiden)))
TrumpVotesLost += int(currTrump - prevTrump)
if currBiden < prevBiden: # and currTrump > prevTrump:
print (curr["timestamp"] + ": Biden Loss: " + str(int(currBiden-prevBiden))+" Trump "+ str(int(currTrump - prevTrump)))
BidenVotesLost += int(currBiden - prevBiden)
print (str("Trump Lost = " + str(int(TrumpVotesLost))))
print (str("Biden Lost = " + str(int(BidenVotesLost))))
def ffDump(NAME):
with open(NAME + '.json', encoding="utf8") as f:
x = json.load(f)
print("Time,Biden Share,Trump Share,Total Votes,Biden Votes,Trump Votes,Biden Mark,Trump Mark")
for i in range(len(x["data"]["races"][0]["timeseries"])):
prev = x["data"]["races"][0]["timeseries"][i-1]
curr = x["data"]["races"][0]["timeseries"][i]
prevBiden = prev["votes"] * prev["vote_shares"]["bidenj"]
prevTrump = prev["votes"] * prev["vote_shares"]["trumpd"]
currBiden = curr["votes"] * curr["vote_shares"]["bidenj"]
currTrump = curr["votes"] * curr["vote_shares"]["trumpd"]
trumpMark = 0
bidenMark = 0
if currTrump < prevTrump and currBiden > prevBiden:
trumpMark = currTrump
if currBiden < prevBiden and currTrump > prevTrump:
bidenMark = currBiden
print (curr["timestamp"] + ","+str(curr["vote_shares"]["bidenj"]) + "," + str(curr["vote_shares"]["trumpd"])+","+str(curr["votes"])+","+str(int(curr["vote_shares"]["bidenj"]*curr["votes"]))+","+str(int(curr["vote_shares"]["trumpd"]*curr["votes"]))+","+str(bidenMark)+","+str(trumpMark))
def findfraud2(NAME):
with open(NAME + '.json', encoding="utf8") as f:
x = json.load(f)
TotalVotesLost = 0
for i in range(len(x["data"]["races"][0]["timeseries"])):
if i != 0 and x["data"]["races"][0]["timeseries"][i]["votes"] < x["data"]["races"][0]["timeseries"][i-1]["votes"]:
TotalVotesLost += x["data"]["races"][0]["timeseries"][i]["votes"] - x["data"]["races"][0]["timeseries"][i-1]["votes"]
print (TotalVotesLost)
##findfraud(f"{sys.argv[1:][0]}")
ff1(f"{sys.argv[1:][0]}")
##ffDump(f"{sys.argv[1:][0]}")
This is modified to run from the command line with a argument, like:
python fraudcatch.py Race-Pres-pennsylvania
I've attached a zip file containing the data as of around 9:30AM PST.
There are two data files for each states, a "race" file, and a "state" file. The race file mostly refers to the Presidential race, and is a smaller file. The state file includes all the data in the race file and adds considerably more data regarding the various state races.