Coronavirus Statistics: Cases, Mortality, vs. Flu

Oystein

Senior Member
...
For comparison, about 1% of people with the flu are hospitalized, and 0.1% die...
We are now focusing lots of bright lights on the Sars2 stats, and finding them lacking and problematic and difficult to interpret in many ways.

I wonder if "we" have scrutinized the flu stats with similar rigor. Or rather, if medicine and science are tracking the flu with the same kind of effort. How do we know we have always been missing a lot of cases, hospitalizations and deaths due to the flu, or conflating other cases of illness, hospitalization and death with the flu? In other words, how sure are we of the numbers we compare the Covid19 numbers to?
 

deirdre

Senior Member
how sure are we of the numbers we compare the Covid19 numbers to?
you have a point, but the biggest issue that would make the numbers incomparable now, in my opinion, aer vaccines. If 70% of people over 65 (encouraged even more for people with underlying health issues), that's a big chunk of the Covid-19 risk population that is protected from flu either direct through vaccine or through herd immunity in nursing homes etc.

https://www.cdc.gov/nchs/fastats/flu.htm

We know that it's about a quarter of those who test positive, but we don't know how many are never tested.
We also dont know how many that would be hospitalized in a normal year with flu, are being sent home with Covid-19 because some hospitals are overwhelmed. so hospitals now may have possibly a criteria of 'sicker' to be admitted.
 

Mendel

Active Member
We are now focusing lots of bright lights on the Sars2 stats, and finding them lacking and problematic and difficult to interpret in many ways.

I wonder if "we" have scrutinized the flu stats with similar rigor. Or rather, if medicine and science are tracking the flu with the same kind of effort. How do we know we have always been missing a lot of cases, hospitalizations and deaths due to the flu, or conflating other cases of illness, hospitalization and death with the flu? In other words, how sure are we of the numbers we compare the Covid19 numbers to?
The incidence rate for influenza is quite high, making flu screening feasible. The RKI is doing that regularly, via AG Influenza, Grippeweb, and the ICOSARI sentinel. AG Influenza has also been screening for SARS-CoV-2 for a few weeks, but the incidence is still quote low. AFAIK the true hospitalization rate for influenza is quite low.There's a RKI epidemiological bulleting with a comparison, if you want to learn more.
 

Mendel

Active Member
New York state has been testing many more people per million than California. Currently, it's 25,443 tests per million in New York versus 5,438 in California.
https://www.worldometers.info/coronavirus/country/us/
View attachment 40339
New York is doing 2.5 tests per case, California is doing 8.5 tests per case, and from what you quoted, LA is currently doing even better than that.

Please lose the notion that virus testing works like survey or an opinion poll. These tests are intended to test the proportion of the population that is most likely to have the disease, or who would be impacted the most if they did have it. This means you put them in relation not to the overall population size, but to the size of the subgroup that has actually been exposed to infection and has symptoms -- and that size is proportional to the number of cases.
 

Oystein

Senior Member
The incidence rate for influenza is quite high, making flu screening feasible. The RKI is doing that regularly, via AG Influenza, Grippeweb, and the ICOSARI sentinel. AG Influenza has also been screening for SARS-CoV-2 for a few weeks, but the incidence is still quote low. AFAIK the true hospitalization rate for influenza is quite low.There's a RKI epidemiological bulleting with a comparison, if you want to learn more.
Well, have we scrutinized the methodology by which the incidence rate for influenza has been determined?

Coincidentally on the morning of February 25th, the very day when the first two Covid-19 cases in my home district and home town became known (first sustained outbreak in Germany), I saw my doctor with fever and a cough. By the symptoms, he determined it wasn't influenza, probably wasn't bacterial either, more likely rhinovirus, gave me a sick leave form for a week, and that was that. I wasn't tested for anything. If he had thought I had influenza, would he have me gotten tested for it? Or do people get (wrongly) counted towards the influenza incidence when their rhinovirus fever exceeds 39° or they over-dramatize their joint aches?

I do not know if I ever got true influenza.

(And by the way: No, it certainly wasn't coronavirus. I almost certainly got infected in the senior citizen's nursing home where I work, and where several residents had a cough and mild fever the week before; no serious cases)
 

DavidB66

Active Member
Is anyone here following the situation in France? On looking at today's 'Worldometers' table, I was startled to see a daily increase of 17,164 in the number of cases for France (cumulative total of 165,027 compared with 147,863 yesterday). The daily increase is way out of line with recent figures for France, which show a gentle decline in daily new cases to around 4,000. Yesterday there was a big daily increase of 1438 in deaths (compared to 753 today), but a note explained that there was some catching up with records from the Easter period. I don't see any such explanation for the big increase in cases today in the source (SantePubliqueFrance) linked by Worldometers. I did manage to work out that the figure of 165027 comes from combining 108,847 'Cas Confirmes' with 56,180 'Cas de Residents en ESMS', which I think is the French term for residential care homes. I suspect that the sudden increase is due to including a big new lump of data from care homes. There was another big increase around the beginning of April, so maybe the data from care homes are only added every two weeks. But I would be interested if anyone has better information. This is not just idle curiosity on my part: the situation here in the UK tends to track that in France, so a sudden unexplained increase is a bit worrying.
 

Mendel

Active Member
ESMS:
https://translate.google.com/transl...ement_ou_service_social_ou_m%C3%A9dico-social

I was startled to see a daily increase of 17,164 in the number of cases for France (cumulative total of 165,027 compared with 147,863 yesterday)
17167 is the number of deaths the ECDC reported on the 16th in the morning, so it looks like a sum gone wrong (or screen scraping gone wrong). The total number of cases to go with that is 106206.
santepublique.png
Upper left: The total number of cases is 108847, so the website is wrong.

Lower left: The totral number of deaths is 17920, 11060 died in hospitals.
Lower right: 8925 deceased among ESMS residents; of these, 6860 died in the ESMS facility and 2065 at the hospital.
11060+6860=17920, works out perfectly

Upper middle and upper right
75,853 people hospitalized including 31,305 in progress
32,812 return home after hospitalization
31305 (currently in hospital)+32812(discharged)+11060(died in hospital)=75177
That's 676 short for some reason. It also means their hospitalization rate is huge, or, considering the quote below, their number of unregistered infections is really high.
That's from page 4 of the cutrrent situation report at https://www.santepubliquefrance.fr/...vid-19-point-epidemiologique-du-16-avril-2020
 

DavidB66

Active Member
Thanks. The Worldometers website does make some errors, so maybe this is just another one. Let us see what the next update looks like.

The similarity of the figures 17167 and 17164 could be just a coincidence. But I doubt that the total for cases
165027 = 108847 + 56180 is a coincidence. It does look like Worldometers have added the two figures together to get a total.
 

chrisl

New Member
There's a new study out of Stanford—not yet peer reviewed—suggesting the infection fatality rate is much lower than the 1–2% range usually estimated. I'm skeptical, because the study would seem to indicate an IFR similar to that of the seasonal flu, and clearly the situation in hospitals in Italy, Spain, and New York is much graver than it would be for a flu.

The study looked for the presence of antibodies to SARS-CoV-2 in residents of Santa Clara County California, and found that the prevalence was much higher than the number of reported cases—50 to 85 times more than the prevalence of confirmed cases, which extrapolated to the entire county population would be 48,000 to 81,000 people infected. The New York Times tracker reports 70 deaths in Santa Clara County as of April 17. I don't have data for the April 4-3 when the testing was done, but using the figure from the 17th puts an upper bound on the IFR of between 0.086% and 0.14%.

Such a low IFR would be great news, but again, I think it's more likely that the study is flawed. I suspect that the serology test is generating some false positives, which would dramatically skew the numbers when the prevalence is so low. The abstract claims that the test performance was checked against a sample of 37 positive and 30 negative controls—that doesn't seem like enough people.

Another thing I wonder about: they tested 3,330 people and found 1.5% had antibodies to SARS-CoV-2, or about 50 people. It seems like a low number of positives. If you haven't recruited a truly random sample (e.g. are people with symptoms more likely to volunteer?), then you'll end up with skewed estimates.

I'd love to hear thoughts from people who have more experience with this type of study.

https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1

Here's the abstract:
Background Addressing COVID-19 is a pressing health and social concern. To date, many epidemic projections and policies addressing COVID-19 have been designed without seroprevalence data to inform epidemic parameters. We measured the seroprevalence of antibodies to SARS-CoV-2 in Santa Clara County. Methods On 4/3-4/4, 2020, we tested county residents for antibodies to SARS-CoV-2 using a lateral flow immunoassay. Participants were recruited using Facebook ads targeting a representative sample of the county by demographic and geographic characteristics. We report the prevalence of antibodies to SARS-CoV-2 in a sample of 3,330 people, adjusting for zip code, sex, and race/ethnicity. We also adjust for test performance characteristics using 3 different estimates: (i) the test manufacturer's data, (ii) a sample of 37 positive and 30 negative controls tested at Stanford, and (iii) a combination of both. Results The unadjusted prevalence of antibodies to SARS-CoV-2 in Santa Clara County was 1.5% (exact binomial 95CI 1.11-1.97%), and the population-weighted prevalence was 2.81% (95CI 2.24-3.37%). Under the three scenarios for test performance characteristics, the population prevalence of COVID-19 in Santa Clara ranged from 2.49% (95CI 1.80-3.17%) to 4.16% (2.58-5.70%). These prevalence estimates represent a range between 48,000 and 81,000 people infected in Santa Clara County by early April, 50-85-fold more than the number of confirmed cases. Conclusions The population prevalence of SARS-CoV-2 antibodies in Santa Clara County implies that the infection is much more widespread than indicated by the number of confirmed cases. Population prevalence estimates can now be used to calibrate epidemic and mortality projections.
—Chris
 
Last edited:

DavidB66

Active Member
Just to follow up my previous comments (#168 and #170), the latest Worldometers report includes a long and complicated note about the French data:

France reported that a portion of the EHPAD and EMS nursing home cases - representing about 33% of the total EHPAD and EMS cases - were confirmed (rather than probable, as the other 67%) and as such are to be considered as already included in the total national case count. The French Government has now started reporting the breakdown between confirmed and probable EHPAD and EMS cases. We have adjusted the historical data for France from April 4 based on this information. In accordance with the decision by the French Government to include EHPAD and EMS probable deaths in the total death count. EHPAD and EMS probable cases must be included in the total case count in order to be logically sound (a death implies a case) and methodologically consistent. On April 3, the French Government had reported 17,827 additional cases and 532 additional deaths from nursing homes that had not been reported previously. On April 2, it had reported 884 additional deaths
I won't attempt to interpret this, beyond making the obvious point that yesterday's apparent large increase in 'cases' for France was not a genuine one-day increase.
 

Mendel

Active Member
The study looked for the presence of antibodies to SARS-CoV-2 in residents of Santa Clara County California, and found that the prevalence was much higher than the number of reported cases—50 to 85 times more than the prevalence of confirmed cases
Another thing I wonder about: they tested 3,330 people and found 1.5% had antibodies to SARS-CoV-2, or about 50 people. It seems like a low number of positives. If you haven't recruited a truly random sample (e.g. are people with symptoms more likely to volunteer?), then you'll end up with skewed estimates.
It has Prof. Ioannidis name on it (the guy who thought 7 deaths on the Diamond princess was too low when it's 12 derths now), and it's rubbish, and they know it. (You can downlod a statistical appendix to the paper, but no shared data.)

First, he picked the county that had the earliest cases in California and had the outbreak the first, ensuring that the population would be undertested. This means that it's likely that every other county in California has fewer unregistered infections than Santa Clara.

Second, study participants were people who responded to a facebook ad. This is a self-selected sample, and this property completely kills the usefulness of the study all by itself. This is a beginner's error! People who think they had Covid-19 and didn't get tested or know someone who did are much more likely to respond to such an ad than people who did not. (By comparison, the Gangelt study contacted 600 carefully chosen households per mail, and 400 responded. Still somewhat self-selected, but not as badly.)

Third, age is the one most common predictor of mortality. He did not weigh the results by age, and old people are underrepresented in the study. Anything he says about mortality is completely useless if we don't know how prevalent the infection was in the older population. (In Germany, cases show that the prevalence among tested older people was low initially and took a few weeks to rise.)

Fourth, instead he weighs prevalence by zip code--why? This exacerbates statistical variations, since there were only 50 positive results, and Santa Clara has ~60 zip codes. If you have a positive result fall on a populous zip code by chance where only a few participants participated, then the numbers are skewed up. They must have seen this happen because their estimated prevalence is almost twice as high as the raw prevalence.

Fifth, the specificity of the test is "99.5% (95 CI 98.3-99.9%)". This means that theoretically, if the specificity was 98.5%, all of the 50 positive results could be false positives, and nobody in the sample would have had any Covid-19. This means the result is not statistically significant even if the sample had been well chosen (which it wasn't). (It's not even significant at the 90% level.)

Sixth, they used a notoriously inaccurate "lateral flow assay" instead of an ELISA test and did not validate their positive samples (only 50) with a more sensitive test -- why not?

Seventh, The Covid-19-antibody test can create false positives if it cross-reacts with other human coronavirus antibodies, i.e. if you test the samples of people who had a cold, your speficity will suffer. Therefore, a manufacturer could a) test blood donor samples, they not allowed to give blood if they have been sick shortly before; b) test samples taken in the summer when people are less likely to have colds than in March.

To state the previous three points this in another way, a large number of positive results (a third if the specificy is actually 99.5%, but probably more than that) are fake, and depending on which zip codes they randomly fall in, they could considerably skew the results.

I'll put some more quotes from the study in the spoiler.
I haven't marked elisions. If you see a blank line, assume I cut something there.
Santa Clara County population is 1,943,411
1,094 confimed cases on April 2nd or 3rd (can someone verify? and give data for the week?)

1.5% crude positive rate of 3330 participants = 50 tests positive
Santa Clara has about 60 zip codes. Note the low number of samples per zip code, and the low number of participants for some.

https://www.dailynews.com/2020/04/0...us-cases-in-californias-hardest-hit-counties/
This spage has a SVG animation that shows this with an animated bar chart.

Santa Clara testing dashboard-SC-CO.jpeg
https://www.kron4.com/news/bay-area...-dashboards-with-latest-covid-19-information/
As you can see, the early weeks have next to no negative tests, so many cases are unrecognized and free to spread in the community.

I tried to check the formula by plugging the result into the bigger equation above, but I couldn't make it work. I'm suspicious that they roll their own maths here instead of simply inserting a textbook formula (with source).
Santa Clara solve.png
The formula says that
P(Covid+) = [ P(Test+) - P(Test+|Covid-) ] / [ P(Test-|Covid-) - P(Test-|Covid+) ]
and I don't understand why that would make sense. The numerator is close to the incidence of true positives if the number of negative samples is near 1, but the denominator says nothing meaningful? It should indicate a factor by which we underestimate the result because we are missing the false negatives.
Mmmh, I pluigged in some numbers and it looks sane, at least.
 

Attachments

Mendel

Active Member
I won't attempt to interpret this, beyond making the obvious point that yesterday's apparent large increase in 'cases' for France was not a genuine one-day increase.
The problem (as indicated by my quote) is that the French officials are using data from different sources. The total number of confirmed cases is probably something they get from the laboratories doing the tests. The umber of cases in ESMS is probably something the ESMS reports. Because they sent samples from 33% of their cases off to be tested, there is overlap between the two data sources. Your comment suggests that on April 3rd, a group of nursing homes reported cases to the government who had not previously done so, and thus the statistics take a leap to catch up. What worldometer should do is take the "total confirmed" and add 66% of the ESMS number to get a case count without overlap; that case count will still be missing the infections in the general population that are never confirmed, but that's obviously true everywhere.

Germany counts only confirmed cases (but we can confirm all suspected cases with symptoms, so an "unconfirmed" diagnosis always gets confirmed). Every physician has to report any patient with an infectious disease that is potentially epidemic to their county health offical, and from there on upwards it gets accumulated through the system and finally reaches the RKI. This means the RKI have access to age, gender, date of onset of symptoms, date of confirmation, and date of death, and cause of death as determined by the attending physician who examines the corpse ("by covid19"/"with covid19"). In many cases, they also have information on symptoms, hospitalization status, etc. This method sometimes leads to lag in the official numbers, but they don't have any overlap and are reasonably complete. They can actually do a reasonably accurate graph of cases by symptom onset from those data.
The RKI has recently arranged to get data directly from the laboratories in order to get a less laggy overview of case numbers, but the thorough system still runs in parallel.
 

Mendel

Active Member
Interesting comparison of cumulative deaths per 100k population. Case counts depend on the amount of testing, but you can select those too in the drop-down menu.
If you use the "per 100k" data, lining it up at the 50th case is misleading; that should be lined up at equal per population values, and that would shift the US to the left and the Netherlands to the right, for example.
How did you determine that case counts depend on the amount of testing? Or did you mean that the accuracy of the case counts depends on that?
(What I really want to see is "new cases" or new deaths, not the cumulative graph; it's not as easy to notice trends there.)
 

Agent K

Active Member
If you use the "per 100k" data, lining it up at the 50th case is misleading; that should be lined up at equal per population values, and that would shift the US to the left and the Netherlands to the right, for example.
How did you determine that case counts depend on the amount of testing? Or did you mean that the accuracy of the case counts depends on that?
(What I really want to see is "new cases" or new deaths, not the cumulative graph; it's not as easy to notice trends there.)
Their default setting is number of confirmed cases, so it starts at the 50th case. I can't change it.
The more you test, the more confirmed cases you'll catch. If you don't test at all, you won't have any confirmed cases.
They have a graph of new cases, not deaths, and not normalized by population.
https://coronavirus.jhu.edu/data/new-cases
1587170431109.png
 

Mendel

Active Member
The more you test, the more confirmed cases you'll catch. If you don't test at all, you won't have any confirmed cases.
"The more water you pour in the bucket, the more water will be in the bucket" stops being true when the bucket runs over.
For testing, the relationship is definitely not linear.
The case rate is driven by two entities: A, the number of people selected for testing, and B, the number of tests. If A>B, then increasing the test volume will yield cases linearly as long as A remains unchanged. But the criteria for selection of A change: while at first, you may only test people with unknown pneumonia (high chance to find cases), you might progress to do contact tracing (~10% chance), and then maybe health care workers, and then screen people newly admitted to returement homes. Each new extension of A creates more demand for B, but it lowers the rate at which additional tests find new cases. In a B>A situation, increasing the test volume alone won't achieve much, and in fact the number of tests performed is actually driven by an increase in the size of A: we perform more tests because we find more cases to trace! Not the testing drives the case numbers, but the case numbers drive the testing then!

So that's why I am saying that the amount of testing determines how accurate the case numbers are. If you only test people with pneumonia, you'll not have an accurate measurement of infections. But you'll have an accurate measurement of people having severe cases of Covid-19, and it's probably going to be proportional to the number of infections, and tells you how many hospital beds you need and are projected to need. If you do contact tracing, your number becomes more accurate, i.e. the number of cases gets closer to the number of infected. It gets more accurate. But it does that because A and B change.
 

Trailspotter

Senior Member
There's a new study out of Stanford—not yet peer reviewed—suggesting the infection fatality rate is much lower than the 1–2% range usually estimated. I'm skeptical, because the study would seem to indicate an IFR similar to that of the seasonal flu, and clearly the situation in hospitals in Italy, Spain, and New York is much graver than it would be for a flu.

I'd love to hear thoughts from people who have more experience with this type of study.
There is a News article on the Nature website, addressing this preprint:
https://www.nature.com/articles/d41586-020-01095-0
 
Last edited:

Mendel

Active Member
@chrisl @Trailspotter
The problems with Stanford's Santa Clara Study (this; https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1 ) are even worse than I thought. In post #173, my analysis stated that the sample was self-selected and hence likely to be very skewed, and that the results were not statistically significant given the numbers. I've copied this analysis to the discussion on medrXiv and am happy to have seen positive feedback, as well as other commenters making similar observations. (It was also flagged as spam for half a day.) Now another commenter found the information for the test kit they used:
https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1#comment-4879825364

This isn't the exact name of the manufacturer listed in the study, but the performance data match:
Santa Clara test stats.png
(from http://en.biotests.com.cn/newsitem/278470281)
(from the study)
Now comes the crux of the matter, citing the study:
If you apply this criterium to the manufacturer data (case is positive if either IgG or IgM is positive), then we have at least 3 and likely even 5 false positives, since we'd need to use the higher numbner of 2 and 3 if there is overlap, and add them if there is none. 5/372 gives a false positive rate of 1.35% (95% CI 0.44%-3.49%), if I reduce the probability to flipping a coin, I get 50% Confidence interval of 0.91%-1.99%.
There's a >50% chance that this result is completely random.

Since the 75/75 hit rate for the positive tests is 100%, we can now use the formula from the appendix:
Santa Clara formula.png
P(Covid+) = [ P(Test+) - P(Test+|Covid-) ] / [ P(Test-|Covid-) - P(Test-|Covid+) ]
P = ( 1.5% - 1.35% ) / ( 98.65% - 0% ) = 0.15/98.65 = 0.00152 = 0.15% = 152/100,000.
This translates to 2955 infections for Santa Clara county if we don't do any weighing, possibly two-three times that with weighing (that's the size of their weighing adjustment).
The study data actually only supports a 3 to 9-fold higher number of infections than confirmed cases. With 100 deaths that this study assumes, we get an infection fatality rate (IFR) of 3.4% (2955 infections) - 1.2% (9000 infections). 3.4% is actually close to the number that came out of China, and with 12 deaths reported by March 24 and 712 infections on the Diamond Princess, that IFR was 1.7% unadjusted.

The study doesn't deserve the attention it is getting.
Can we move the posts concerning this to a separate topic? Because I feel that this is a proper debunk now.
 
Last edited:

Agent K

Active Member
New York state reports 18,298 deaths, and has 19.45 million residents. If every single resident had been infected with COVID-19, the IFR would be 0.094%. If one in ten New Yorkers were already infected, the IFR is 0.94%.
 

Mendel

Active Member
New York state reports 18,298 deaths, and has 19.45 million residents. If every single resident had been infected with COVID-19, the IFR would be 0.094%. If one in ten New Yorkers were already infected, the IFR is 0.94%.
New York isn't Santa Clara, but with 230 000 cases and a 10% prevalence (your 1 in ten infected) we'd be close to that 9-fold underestimate of infections.
I have no idea what NY's testing strategy and coverage is right now, but if they only test severe cases, they are missing out on about a 3-fold number of mild cases, and a similar number of asymptomatic cases, so by my reckoning, that would be similar to the magnitude I'd expect to see.
Obviously, that's very rough reckoning.

Your IFR is the "naive" IFR, because many people who are infected now haven't died yet. My personal feeling is that the final IFR might be around 1.5% , but who knows.
 

Mendel

Active Member
Let's look at excess mortality. That number is the gold standard for epidemics, and the proper rebuttal to people who say "it's just like the flu" and "these people would have died anyway". We can easily see a lot of excess mortality by looking at hard-hit regions like Lombardy or Wuhan, but it does show up at country level for some countries who have had major outbreaks.

To understand the graphs that follow, I need to explain the z-score. The z-score is a way to normalize a statistical value by relating it to the statistical average and the standard deviation sigma.
image.png
https://en.m.wikipedia.org/wiki/Standard_score
As you can see, 95% of random data points have z-scores between -2 and +2, and values with higher z-scores are quite rarely caused by random chance.

http://www.euromomo.eu/index.html

First, Germany looks a bit weird because only two of our 16 states are participating. It's obvious that the countries that have been on the news a lot have significant excess mortality. This is also borne out by the graphs:
image.png
Note the different scales of the y-axes: they always go from -4 to 8, but often the lines will be closer together to accomodate a graph exceeding z=8 by a large margin. The graphs do show the excess mortality caused by the yearly flu season for the previous 3 years, with 2018/19 being mild almost everywhere, and 2019/20 as well until the blue Covid spike well into 2020.
I wonder how this is going to develop in the coming weeks.
 
Last edited:

Mendel

Active Member
COVID-19 Antibody Seroprevalence in Santa Clara County, California
https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1
From the statistical appendix:
The study reports only the IgG specificity of the test kit, omitting the false error rate for IgM. It also sets the sensitivity at the lower of both values. This is mathematically consistent with accepting a sample as positive if it passes both the IgG and the IgM test. However, the package insert states that a test is positive if either one of these is detected, it doesn't require both.
Santa Clara test positive.png
What did the study do? Page 6 states: "The total number of positive cases by either IgG or IgM in our unadjusted sample was 50". It comes down to a point of grammar: does "positive by either" mean it had to register positive by both?

This is a rather crucial point: if they counted a test as positive if just one of IgG and IgM was positive, the mathematical analysis is invalid and needs to be redone.
 

Attachments

chrisl

New Member
COVID-19 Antibody Seroprevalence in Santa Clara County, California
https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1

What did the study do? Page 6 states: "The total number of positive cases by either IgG or IgM in our unadjusted sample was 50". It comes down to a point of grammar: does "positive by either" mean it had to register positive by both?

This is a rather crucial point: if they counted a test as positive if just one of IgG and IgM was positive, the mathematical analysis is invalid and needs to be redone.
Thanks for doing a deeper dive on this study. I look forward to seeing what sort of responses the authors offer to questions about their sampling and the real specificity of the the serology test.

There are a lot of coronavirus contrarians out there using this study as evidence for their 'just a flu' viewpoints. I'd love for them to be right. But when the IFR inferred from the 50 positives in this study conflicts with the actual body counts in NYC, Lombardy, etc., I'm going to bet that the study is wrong. The Santa Clara study also seems to disagree with the results of broad testing in Iceland.

—Chris
 

chrisl

New Member
And just after I finished my last message, there's another serology study from California that reports similarly high infection rates as the Santa Clara County study. This one is from USC and the Los Angeles County Department of Public Health.

I found the press release, but no paper:
http://publichealth.lacounty.gov/phcommon/public/media/mediapubhpdetail.cfm?prid=2328

Based on results of the first round of testing, the research team estimates that approximately 4.1% of the county's adult population has antibody to the virus. Adjusting this estimate for statistical margin of error implies about 2.8% to 5.6% of the county's adult population has antibody to the virus- which translates to approximately 221,000 to 442,000 adults in the county who have had the infection. That estimate is 28 to 55 times higher than the 7,994 confirmed cases of COVID-19 reported to the county by the time of the study in early April. The number of COVID-related deaths in the county has now surpassed 600.
A few more details can be found in a Q&A with USC researcher Neeraj Sood, who led the study:

https://pressroom.usc.edu/what-a-usc-la-county-antibody-study-can-teach-us-about-covid-19/

I remain skeptical for the same reasons as before. It sounds like they're using the same test as the Stanford group, based on this answer from the linked Q&A:

How reliable are the antibody tests?
Premier Biotech, the manufacturer of the test that USC and L.A. County are using, tested blood from COVID-19-positive patients with a 90 to 95% accuracy rate. The company also tested 371 COVID-19-negative patients, with only two false positives. We also validated these tests in a small sample at a lab at Stanford University. When we do our analysis, we will also adjust for false positives and false negatives.
—Chris
 

Mendel

Active Member
Thanks for doing a deeper dive on this study. I look forward to seeing what sort of responses the authors offer to questions about their sampling and the real specificity of the the serology test.
If you see any, let me know. I bet this is going to vanish in a puff of smoke once more robust studies come out. The WHO director of research said in the press conference today that the WHO is supporting some more robust studies across the world, and she indicated that preliminary data looks like the percentage of undiscovered cases would be rather smaller. We'll have to wait and see. Meanwhile, there are 200 comments on this study on medarxiv. :p

Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted data from the outbreak on the Diamond Princess cruise ship, February 2020
https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2020.25.12.2000256

With only about half the positives being asymptomatic, that doesn't really leave a lot of leeway for undiscovered cases in a community with good access to testing; even if you estimate that asymptomatic cases are postive for a week and symptomatic cases for two before they have to go to intensive care, that's just 2 to 1, maybe 3 to 1 if we allow that they didn't test everyone all at once.

And the CFRs in that study that are listed in table 2 really don't need any adjustment when you add the next 5 deaths (reported e.g. on wikipedia) to it.
 

Mendel

Active Member
And just after I finished my last message, there's another serology study from California that reports similarly high infection rates as the Santa Clara County study. This one is from USC and the Los Angeles County Department of Public Health.

I found the press release, but no paper:
http://publichealth.lacounty.gov/phcommon/public/media/mediapubhpdetail.cfm?prid=2328
That's promising, at least. Since they took the samples on April 10th and 11th, the paper ought to not yet be out if they're doing a throrough job? It does look like the same test:
LAcounty m6kJ88gQ.jpeg
A few more details can be found in a Q&A with USC researcher Neeraj Sood, who led the study:
https://pressroom.usc.edu/what-a-usc-la-county-antibody-study-can-teach-us-about-covid-19/
Based on the published variance of 2.8-5.6%, I'm guessing they had 35 positive tests of 850 valid samples, but that's probably wrong. I'm hoping they validate these positives with a different test, that'll reveal a better false positive rate.
 

Agent K

Active Member
With only about half the positives being asymptomatic, that doesn't really leave a lot of leeway for undiscovered cases in a community with good access to testing; even if you estimate that asymptomatic cases are positive for a week and symptomatic cases for two before they have to go to intensive care, that's just 2 to 1, maybe 3 to 1 if we allow that they didn't test everyone all at once.
Only 18% of the positives remained asymptomatic; the rest developed symptoms. 13 died and 7 remain serious or critical, and the longer they're critical the less chance they'll recover.
https://www.worldometers.info/coronavirus
 

Z.W. Wolf

Senior Member
I have a question... regarding the Santa Clara County study specifically.

-They're using a test which makes the issue of false positives and false negatives a bad problem.
-Increasing N over 1000 has surprisingly little effect on the margin of error. (This being more true with simpler statistical methods and less true with more complex methods.)

Question: Would it have been better to test all samples twice with the same type of test? Throw out all positives that were not duplicated. Without increasing the total number of tests used (and the dollar cost), N would still be over 1000, while error from false positives and false negatives would be much reduced.

(I'm still thinking about this... I think this is a different method than double checking all positives on a post hoc basis. I think my method controls for false negatives as well as false positives.)

The problem with poor randomization would still be there.
 
Last edited:

Khan Desi

New Member
I have a question... regarding the Santa Clara County study specifically.

-They're using a test which makes the issue of false positives and false negatives a bad problem.
-Increasing N over 1000 has surprisingly little effect on the margin of error. (This being more true with simpler statistical methods and less true with more complex methods.)

Question: Would it have been better to test all samples twice with the same type of test? Throw out all positives that were not duplicated. Without increasing the total number of tests used (and the dollar cost), N would still be over 1000, while error from false positives and false negatives would be much reduced.

(I'm still thinking about this... I think this is a different method than double checking all positives on a post hoc basis. I think my method controls for false negatives as well as false positives.)

The problem with poor randomization would still be there.
Simple probability tells us that if the probability of a false positive is 1% (p=0.01), then the probability of testing someone twice and getting a false positive twice is just 0.01*0.01 = 0.0001, or 0.01%. This is highly unlikely, so it would seem to be a good filter for weeding out false positives. Of course, if the probability of actually having the virus is low enough, then 0.01% might still be higher than the odds of actually having the virus (P(having virus, given two positive results) < P(not having virus, given two positive results)), but I don't think that's the case here.

The above is only true if the repeated tests can be considered independent. If for some reason they can't, for example if something in the person's sample is more likely to trigger a false positive than someone else's sample, then it becomes trickier. All that being said, I did see some commentary asking why the people who tested positive, which was a relatively small amount, weren't re-tested, either the same way or using a more robust testing mechanism.
 

Mendel

Active Member
Increasing N over 1000 has surprisingly little effect on the margin of error.
I think they did that mostly so they could do their weighing. Which didn't fix the age group issues, and probably obscured the error: the 1.5% raw rate is within their 95% Confidence Interval for the false positive rate, but once they've blown the probabilities up with the weighing, their variance range no longer includes 0%. The fact that they rolled their own statistics instead of using a textbook procedure is suspicious to me.
The above is only true if the repeated tests can be considered independent. If for some reason they can't, for example if something in the person's sample is more likely to trigger a false positive than someone else's sample, then it becomes trickier. All that being said, I did see some commentary asking why the people who tested positive, which was a relatively small amount, weren't re-tested, either the same way or using a more robust testing mechanism.
If you use the same test, it's not independent. The package insert that I attached to post #184 states clearly that the test can cross-react with "common cold" coronaviruses; if you have this cross-reactivity in a sample, a re-test isn't going to help.
A better type of test is a neutralisation test, where you check in the lab whether the sampled blood can actually attack the virus and keep it from growing in tissue. I think you could also use an antibody test that uses a different RNA section of the antigen for detection, but I'm not entirely clear on this.

Only 18% of the positives remained asymptomatic;
I'm not doubting you, but do you have a source for that?
I'm surprised that worldometers still lists 55 active cases, I'd have thought they'd be all closed now, but maybe these are crew?
 

derwoodii

Senior Member
Hi all could i ask for a summary of this thread please, its a few pages now and data heavy and im not that wise or gifted to grasp what has been shown..

regards D
 

derwoodii

Senior Member
Oh if this is relevant i think may help as it seems the Australian flu season appears to have been headed off by the hand-washing, quarantine and social distancing measures designed to control COVID-19.



Flu season that looked like 'a big one' beaten by hygiene, isolation

https://www.smh.com.au/national/flu...ten-by-hygiene-isolation-20200420-p54lh7.html


 

Z.W. Wolf

Senior Member
Okay, weighting... so they want a large value for N... so they can chop it up.
I'm suspicious of weighting... rightly or wrongly.

Same test is not independent. Yes, point taken.

I'm laboring under some handicaps.

-I'm really rusty... 28 years since last statistics class.

-I'm trying to re-educate myself about specificity and selectivity... instead of going to bed as I should be doing. I'm too sleepy for this stuff!

But I've run across something in Wikipedia that seems to be apropos to this over-tried mind.
https://en.wikipedia.org/wiki/Base_rate_fallacy#False_positive_paradox
False positive paradox[edit]
One type of base rate fallacy is the false positive paradox, where false positive tests are more probable than true positive tests, occurring when the overall population has a low incidence of a condition and the incidence rate is lower than the false positive rate. The probability of a positive test result is determined not only by the accuracy of the test but by the characteristics of the sampled population.[2] When the incidence, the proportion of those who have a given condition, is lower than the test's false positive rate, even tests that have a very low chance of giving a false positive in an individual case will give more false than true positives overall.[3] So, in a society with very few infected people—fewer proportionately than the test gives false positives—there will actually be more who test positive for a disease incorrectly and don't have it than those who test positive accurately and do. The paradox has surprised many.[4]


Example
A group of police officers have breathalyzers displaying false drunkenness in 5% of the cases in which the driver is sober. However, the breathalyzers never fail to detect a truly drunk person. One in a thousand drivers is driving drunk. Suppose the police officers then stop a driver at random to administer a breathalyzer test. It indicates that the driver is drunk. We assume you don't know anything else about him or her. How high is the probability he or she really is drunk?
Many would answer as high as 95%, but the correct probability is about 2%.

An explanation for this is as follows: on average, for every 1,000 drivers tested,

  • 1 driver is drunk, and it is 100% certain that for that driver there is a true positive test result, so there is 1 true positive test result
  • 999 drivers are not drunk, and among those drivers there are 5% false positive test results, so there are 49.95 false positive test results
Therefore, the probability that one of the drivers among the 1 + 49.95 = 50.95 positive test results really is drunk is {\displaystyle 1/50.95\approx 0.019627}{\displaystyle 1/50.95\approx 0.019627}.

The validity of this result does, however, hinge on the validity of the initial assumption that the police officer stopped the driver truly at random, and not because of bad driving. If that or another non-arbitrary reason for stopping the driver was present, then the calculation also involves the probability of a drunk driver driving competently and a non-drunk driver driving (in-)competently.
 
Last edited:

deirdre

Senior Member
Hi all could i ask for a summary of this thread please, its a few pages now and data heavy and im not that wise or gifted to grasp what has been shown..

regards D
We don't know anything even remotely concrete, because the virus is too new and no one has collected nearly enough data yet.
 

derwoodii

Senior Member
We don't know anything even remotely concrete, because the virus is too new and no one has collected nearly enough data yet.

ta agree D but was hoping to just get the jist save me some reading and data comprehension time & perhaps it be some help other MB site readers who are following,,, but im happy to wait watch as things develop i can gather digest the details slowly..
 

deirdre

Senior Member
ta agree D but was hoping to just get the jist save me some reading and data comprehension time & perhaps it be some help other MB site readers who are following,,, but im happy to wait watch as things develop i can gather digest the details slowly..
it really is the jist. the one or two locales that have done more general population testing still haven't done enough. and there are simply so many variables.
we need a few months until there is an adequately reliable FDA approved antibody test and large scale testing to really know anything about true mortality rates. I'm sure when an appropriate study comes out @Mick or whoever will start a separate thread.

This thread has data from all over the place (south korea, vs China, vs Germany vs Italy etc etc). There's no easy way to summarize the current thread content. You just have to skim through it.

Or pick a specific topic, like "can y'all summarize this Santa Clara study in simple English and 2 paragraphs?"
 
Last edited:

Mendel

Active Member
Oh if this is relevant i think may help as it seems the Australian flu season appears to have been headed off by the hand-washing, quarantine and social distancing measures designed to control COVID-19.
We've seen a similar effect in Germany , except since we were mostly past the flu season already, it just stopped sooner. That's the fun thing about hygiene: it has good health effects across the board, you have nothing to lose if you advocate it.

Hi all could i ask for a summary of this thread please, its a few pages now and data heavy and im not that wise or gifted to grasp what has been shown..
Well, what do you want to know?

Overview

* It is hard to project how many people are going to die, because
-- we do not know how many people will be infected in the end
-- we do not know how many of the infected will die
-- we do not know how the case numbers relate to the dead and infected

We have estimates for the stuff we do not know, and hot discussion about whose estimates are more correct. Studies are emerging right now that give us data so we can know.

CFR = Case Fatality Rate = deaths/cases
This is the percentage of cases that are dying. This depends highly on what counts as a case, i.e. how well a specific country is detecting cases. Countries in the containment phase that manage contact tracing well and have appropriate testing resources should detect close to every infected person with symptoms. (I assume that's happening when 90% or more of a region's PCR tests come back negative.) Countries in the mitigation phase and where testing capacity is stretched will detect far less. Since you usually detect most of the deaths (but even there, some countries didn't initially know how many people did outside of hospitals!), the CFR is higher if you find fewer cases. Cases are typically confirmed with a genetic PCR test which is very specific. (The WHO test will only detect bat coronaviruses, and there's only one of these going around right now.)

Age is the major risk factor.
The case fatality rates from the big Chinese outbreak have stood up well, in my opinion. They're under 0.5% for ages up to 50, 1.3% (50-60), 3.6% (60-70), 8.0% (70-80), 14.8% (over 80). The 14.8% matches for the German data I looked at a while back, the others were somewhat lower (~4% in 60-80), but that may have changed as more data has come in, I should re-process that (see below).

Most of the people who die of Covid-19 are old or very old. So if you look at the mortality of a population (how many people will die?), you need to adjust this for how many old people you have. Any study that doesn't do that can't tell you anything robust about mortality.

People die a lot later than when they become cases, so to find the mortality of the cases you have now, you theoretically have to wait until all of them are recovered or dead. This is a big problem if you want to be first to publish, and basically makes some Diamond Princess studies very questionable, because the study is based on 7 deaths and now we have 13 or 14.

IFR = Infection Fatality Rate = deaths/infected = lethality
This is the percentage of people infected with the virus that are dying. To determine this number, we need to find out who has been infected. We know the number of cases, but we don't know how many infected people never became cases because they never got tested. (The lethality is obviously also heavily age-specific!)

Happily, when your body fights an infection, it creates antibodies (such as IgG and IgM), and these persist for months after the infection. We can simply survey a population with an antibody test and find out who already got infected.

The problem is that antibody tests are hard to develop. They have a poorer record of distinguishing Covid-19-antibodies from other coronavirus antibodies that we might have because we caught a cold. As Z.W.Wolf just explained, this becomes a problem if you are trying to find a small number in a big sample.

Test Error
A test can have 4 outcomes:
- person was infected, test is positive: true positive
- person was not infected, test is positive: false positive (error!)
- person was not infected, test is negative: true negative
- person was infected, test is negative: false negative (error!)
Specificity = "true negatives"/"not infected"
Sensitivity = "true positives"/"infected"
100%-Specificity = false positive rate (not infected counted as infected)
100%-Selectivity = false negative rate (infected people you miss)

Obviously, we need to know these values to analyse test results for error, but there's also error inherent in the way we discover them. If we test 30 samples known to not be infected (because they're from last year), and all results are negative, do we have 100% specificity? Not quite, because if the true specificity is 90.5% = 9.5% false error rate, there's still a random chance of 5% to get that result. So all we can say is that we're 95% confident that the true specificity is 90.5%-100%. That's the 95% confidence interval, and that's what you usually see as error bars on good graphs.

But it gets worse: if our lab used samples from healthy people (such as blood donors) taken during the summer (when fewer people have colds) to determine specificity, that wouldn't apply if we test everybody in April, right after the flu season: we'd expect this sample to have more false positives that the sample used to evaluate the test. (If you don't know how the manufacturer selected their samples, they may have selected them to make their test look good.)

So, to do this immunological survey that we need to find out who has been infected, we need our antibody test to be really good, or we need to at least confirm the positives with a really good test to be sure we don't count false positives.

Herd immunity
If we want to predict how our hospitals are coping, we don't really need the IFR: we can look at the hospitalization rate, or predict that from the case rate, and base our short-term public health measures off that. But if we have the IFR, we can take our number of deaths, divide by the IFR, and get an estimate of how many people are infected right now. And then we hope that everyone who has been infected is immune to the virus. With "common cold" coronaviruses, most people retain immunity for a while (maybe 2 years?) after they have had an infection.
And that hinders the spread of the virus. If the virus spreads to 3 people on average (R0=3), then we need to reach the point where 2/3 are immune: the virus wants to spread to 3 people, but only 1 person gets sick. That means the epidemic can no longer grow, and as R0 drops below 1, it fizzles out. (This is why anti-vaxxing "works" if not too many people do it.)

The hope is that if we have a lot more infected people than we thought, we might not have to wait for a vaccine to make that happen. Unfortunately, it doesn't much look like that's going to the case, but we'll know for sure when some robust studies have been published. This is underway, but takes time to do properly.

If we assume we have no vaccine and that 70% of the old people get infected, we take 70% of the lethality (IFR) and we have a good idea of how many people are going to die. This is another reason why the "it's harmless" believers like to see a low IFR: if the overall mortality isn't that high, it's somehow ok if grandma dies. (If you're looking forward to retirement, you ought to think twice about creating a society that kills old people off without remorse.)

Mortality = deaths / population
Here is where the "it's harmless" people come in. "People die every day, if we don't have a lot of people dying from Covid-19, why bother?" If you live in a region where the deaths haven't gotten out of hand, you can't disprove them. But the data from other regions and countries makes it clear that Covid-19 doesn't just cause people to die "who would have died anyway", it's significantly more, and the EuroMOMO data I posted in #183 proves that.

The way this works is via excess mortality, which is a bit of a shortcut: you take a baseline mortality computed from past years that had no outbreaks (mild flu season), and you compare the actual mortality to that baseline. The deaths that are significantly above the baseline are the excess mortality, and they are considered to be caused by the current outbreak (typically influenza). The excess mortality from Covid-19 is clearly visible in many countries.
image.jpeg

Prevalence, incidence rates, and case numbers
Prevalence and incidence rate are ratios, typically with respect to 100 000 people.
The news reports absolute numbers of cases and deaths. The trouble with that is that these numbers are roughly in line with population figures. They don't help you answer the question if a small town has fewer cases because it is smaller than e.g. New York, or if it has fewer cases because the virus spreads less rapidly. If you make a "corona map" of the US, it just looks a lot like a population map if you use absolute numbers. If we want to compare populations of different sizes, we need to use rates.

Contact Tracing
Health officials try to ask every confirmed infected person whom they might have infected. High-risk contacts are typically those who you have been closer than 6 feet to for more than 15 minutes. These contacts are then isolated, and tested if they have symptoms. Contact tracing and isolation breaks chains of infection and contains the spread. A Covid-19 contact tracing team typically consists of five people who are on the phone a lot. This seems awfully low-tech, but's proven to work. The South Korean super spreader cluster was contained with contact tracing. Germany's first outbreak was contained with contact tracing, which bought us much-needed time.

Contact tracing requires adequate manpower and timely access to test results. It contributes to mitigation even if it's overburdened, but it's our best hope to contain the spread in a situation where we relax social distancing measures. We've talked elsewhere about the many outbreaks of infectious diseases occurring all over the globe each year, and most don't turn into epidemics because of contact tracing.

Covid-19 is difficult to trace because there's evidence that people are most infectious on the day before symptoms start. If it takes you a day to decide to see a doctor, and then there's another day delay until you receive the test result, the first people you have infected could already be infectious themselves. This is why isolation of contacts is absolutely crucial, even if they have no symptoms.



That's the factual overview. It feels like I forgot to explain something important, so please ask if you think something's missing.

Personal Outlook
We are conditioned by Hollywood (and maybe human nature) to perceive winning a fight as something highly dramatic that we overcome. That's why war movies make good viewing, and why New York is on the TV so much (and why there's always more strife on reality TV than we see in the real world). But it's healthier to not have drama. It's better to arrest a perp before he gets into the car instead of having a high-speed chase. It's better to do long hours of observation on cold street corners than to have a shootout. It's better to head an epidemic off than to fight its effects.

This requires forethought and planning. It requires trusting in the people who do the planning, especially if it is successful, because then you never get to see the drama that has been headed off. This is difficult because the planning has to be done under uncertainty. We can't wait until we have researched all the answers, because by then our inactivity has created the drama we are planning to avoid. We need to trust the people who are best placed to take a good estimate of where we're headed, and listen to them, and accept that they will turn out to have been more or less accurate in their predictions. (The conspiracy theorists suggest that we should trust those people least who have been most accurate in their planning. The problem with that approach is obvious.)

So especially when our politicians and scientists have been successful, we need to cope with the fact that we have invested a lot of effort with no drama to show for it. This is particularly difficult in Western, more individualistic societies. If we don't see the drama, we tend to want to believe that there is no danger. (If we trust the drama that the media are showing us elsewhere, that's a substitute, but those who don't trust the media have a problem again.)

Statistics become a tool in this: they tell us how much danger we are in. Those concerned with planning want the numbers to be accurate and to show the exact danger we're in, and those who want the danger to not be there want the numbers to confirm their world-view. This is what the fight is about. It's about seeing 15 cases as a small number that will quickly go away, or as a sign that an epidemic is on the horizon. By now, it's obvious which approach is trustworthy.
 
Last edited:

deirdre

Senior Member
It feels like I forgot to explain something important
you forgot to tell us how Covid fatality rates relate to flu fatality rates. (granted i havent read the last few pages of this thread, but isnt that the general topic?)
 
Thread starter Related Articles Forum Replies Date
Mick West Local Perspectives on Coronavirus re-openings Coronavirus COVID-19 8
Mendel Debunked: The WHO did not take the Taiwan CDC seriously Coronavirus COVID-19 0
TEEJ Fake Coronavirus Detector in Iran Coronavirus COVID-19 0
Mick West Debunking Correlations Between 5G deployments and Coronavirus Coronavirus COVID-19 14
Mick West Boris Johnson in ICU with Coronavirus Symptoms Coronavirus COVID-19 8
Rory Claim: UK Coronavirus Bill (HC Bill 122) means "bad things" Coronavirus COVID-19 9
Mick West Claim: China Mobile loses 8.116 Million subscribers because of Coronavirus Coronavirus COVID-19 2
Agent K Claim: Harvey Weinstein has coronavirus Coronavirus COVID-19 9
deirdre Coronavirus and Younger people Coronavirus COVID-19 14
Mick West Coronavirus Related Shortages in Shops Coronavirus COVID-19 110
Arugula COVID-19 Coronavirus current events Coronavirus COVID-19 447
Mick West Clustering Illusions, Baader-Meinhof, and Death in the Domincan Republic General Discussion 15
Mick West Statistics Help Needed - Understanding the practical meaning of r values Practical Debunking 6
Mick West Debunked: 150 Calories of sugar leads to 11-fold increase in the prevalence of diabetes [1.1%] Health and Quackery 4
Mick West Debunked: Ebola CDC Quarantine Map Matches Immigration/Agenda 21 Maps [Population centers] Conspiracy Theories 9
TWCobra Chemtrail search statistics-Google Contrails and Chemtrails 2
Mick West Abusing Statistics about Homosexuality General Discussion 7
Mick West Searching and Linking to MUFON cases Skydentify - What is that Thing in the Sky? 0

Related Articles

Top