Coronavirus Statistics: Cases, Mortality, vs. Flu

...
For comparison, about 1% of people with the flu are hospitalized, and 0.1% die...
We are now focusing lots of bright lights on the Sars2 stats, and finding them lacking and problematic and difficult to interpret in many ways.

I wonder if "we" have scrutinized the flu stats with similar rigor. Or rather, if medicine and science are tracking the flu with the same kind of effort. How do we know we have always been missing a lot of cases, hospitalizations and deaths due to the flu, or conflating other cases of illness, hospitalization and death with the flu? In other words, how sure are we of the numbers we compare the Covid19 numbers to?
 
how sure are we of the numbers we compare the Covid19 numbers to?
you have a point, but the biggest issue that would make the numbers incomparable now, in my opinion, aer vaccines. If 70% of people over 65 (encouraged even more for people with underlying health issues), that's a big chunk of the Covid-19 risk population that is protected from flu either direct through vaccine or through herd immunity in nursing homes etc.


  • Percent of children aged 6 months to 17 years who received an influenza vaccination during the past 12 months: 50.4%
  • Percent of adults aged 18-49 who received an influenza vaccination during the past 12 months: 34.2%
  • Percent of adults aged 50-64 who received an influenza vaccination during the past 12 months: 46.8%
  • Percent of adults aged 65 and over who received an influenza vaccination during the past 12 months: 68.7%
Content from External Source
https://www.cdc.gov/nchs/fastats/flu.htm

We know that it's about a quarter of those who test positive, but we don't know how many are never tested.
We also dont know how many that would be hospitalized in a normal year with flu, are being sent home with Covid-19 because some hospitals are overwhelmed. so hospitals now may have possibly a criteria of 'sicker' to be admitted.
 
We are now focusing lots of bright lights on the Sars2 stats, and finding them lacking and problematic and difficult to interpret in many ways.

I wonder if "we" have scrutinized the flu stats with similar rigor. Or rather, if medicine and science are tracking the flu with the same kind of effort. How do we know we have always been missing a lot of cases, hospitalizations and deaths due to the flu, or conflating other cases of illness, hospitalization and death with the flu? In other words, how sure are we of the numbers we compare the Covid19 numbers to?
The incidence rate for influenza is quite high, making flu screening feasible. The RKI is doing that regularly, via AG Influenza, Grippeweb, and the ICOSARI sentinel. AG Influenza has also been screening for SARS-CoV-2 for a few weeks, but the incidence is still quote low. AFAIK the true hospitalization rate for influenza is quite low.There's a RKI epidemiological bulleting with a comparison, if you want to learn more.
 
New York state has been testing many more people per million than California. Currently, it's 25,443 tests per million in New York versus 5,438 in California.
https://www.worldometers.info/coronavirus/country/us/
1586957281781.png
New York is doing 2.5 tests per case, California is doing 8.5 tests per case, and from what you quoted, LA is currently doing even better than that.

Please lose the notion that virus testing works like survey or an opinion poll. These tests are intended to test the proportion of the population that is most likely to have the disease, or who would be impacted the most if they did have it. This means you put them in relation not to the overall population size, but to the size of the subgroup that has actually been exposed to infection and has symptoms -- and that size is proportional to the number of cases.
 
The incidence rate for influenza is quite high, making flu screening feasible. The RKI is doing that regularly, via AG Influenza, Grippeweb, and the ICOSARI sentinel. AG Influenza has also been screening for SARS-CoV-2 for a few weeks, but the incidence is still quote low. AFAIK the true hospitalization rate for influenza is quite low.There's a RKI epidemiological bulleting with a comparison, if you want to learn more.
Well, have we scrutinized the methodology by which the incidence rate for influenza has been determined?

Coincidentally on the morning of February 25th, the very day when the first two Covid-19 cases in my home district and home town became known (first sustained outbreak in Germany), I saw my doctor with fever and a cough. By the symptoms, he determined it wasn't influenza, probably wasn't bacterial either, more likely rhinovirus, gave me a sick leave form for a week, and that was that. I wasn't tested for anything. If he had thought I had influenza, would he have me gotten tested for it? Or do people get (wrongly) counted towards the influenza incidence when their rhinovirus fever exceeds 39° or they over-dramatize their joint aches?

I do not know if I ever got true influenza.

(And by the way: No, it certainly wasn't coronavirus. I almost certainly got infected in the senior citizen's nursing home where I work, and where several residents had a cough and mild fever the week before; no serious cases)
 
Is anyone here following the situation in France? On looking at today's 'Worldometers' table, I was startled to see a daily increase of 17,164 in the number of cases for France (cumulative total of 165,027 compared with 147,863 yesterday). The daily increase is way out of line with recent figures for France, which show a gentle decline in daily new cases to around 4,000. Yesterday there was a big daily increase of 1438 in deaths (compared to 753 today), but a note explained that there was some catching up with records from the Easter period. I don't see any such explanation for the big increase in cases today in the source (SantePubliqueFrance) linked by Worldometers. I did manage to work out that the figure of 165027 comes from combining 108,847 'Cas Confirmes' with 56,180 'Cas de Residents en ESMS', which I think is the French term for residential care homes. I suspect that the sudden increase is due to including a big new lump of data from care homes. There was another big increase around the beginning of April, so maybe the data from care homes are only added every two weeks. But I would be interested if anyone has better information. This is not just idle curiosity on my part: the situation here in the UK tends to track that in France, so a sudden unexplained increase is a bit worrying.
 
ESMS:
Depending on the categories of vulnerable populations for which their action is intended (minors at risk, disabled people, elderly people, adults suffering from chronic illness, people in a situation of exclusion) and the origins of the funds used to redistribute the services they provide (health insurance, departmental social assistance, state social assistance), they come under the authority of either the director general of the Regional Health Agency (ARS), or the President of the Departmental Council, or the Prefect of the region, or more than one of them if there is joint jurisdiction.
Content from External Source
https://translate.google.com/transl...ement_ou_service_social_ou_m%C3%A9dico-social

I was startled to see a daily increase of 17,164 in the number of cases for France (cumulative total of 165,027 compared with 147,863 yesterday)
17167 is the number of deaths the ECDC reported on the 16th in the morning, so it looks like a sum gone wrong (or screen scraping gone wrong). The total number of cases to go with that is 106206.
santepublique.png
Upper left: The total number of cases is 108847, so the website is wrong.

Lower left: The totral number of deaths is 17920, 11060 died in hospitals.
Lower right: 8925 deceased among ESMS residents; of these, 6860 died in the ESMS facility and 2065 at the hospital.
11060+6860=17920, works out perfectly

Upper middle and upper right
75,853 people hospitalized including 31,305 in progress
32,812 return home after hospitalization
31305 (currently in hospital)+32812(discharged)+11060(died in hospital)=75177
That's 676 short for some reason. It also means their hospitalization rate is huge, or, considering the quote below, their number of unregistered infections is really high.
Cas confirmés de COVID-19
Les données permettant d’estimer le nombre de cas confirmés sont issues de plusieurs sources.
Entre le 21 janvier et le 25 mars 2020, 25 233 cas de COVID-19 ont été signalés à Santé publique France via l’application GoData ou par transmission des cellules régionales de Santé publique France.
Depuis le 26 mars 2020, le nombre de cas confirmé de COVID-19 sont estimés au niveau national en tenant compte des données de laboratoires de biologie médicale de ville et des patients hospitalisés pour COVID-19 (source SI-VIC). Il a été estimé ainsi qu’entre le 21 janvier et le 14 avril 2020, 103 573 cas de COVID-19 ont été confirmés en France.
Les patients présentant des signes de COVID-19 ne sont plus systématiquement confirmés par un test biolo-gique. Selon les recommandations ministérielles du 13 mars 2020, la réalisation de prélèvements à visée dia-gnostique n’est recommandée que pour certains patients et il convient notamment de tenir compte des comorbi-dités, de la profession (professionnels de santé) et du tableau clinique.
Le nombre réel de cas de COVID-19 en France est donc supérieur au nombre de cas confirmés rapporté.
Le nombre de cas confirmés en France ne reflète donc plus de manière satisfaisante la dynamique de l’épidé-mie.
Content from External Source
Confirmed cases of COVID-19
The data used to estimate the number of confirmed cases come from several sources.
Between January 21 and March 25, 2020, 25,233 cases of COVID-19 were reported to Public Health France via the GoData application or by transmission from regional Public Health France cells.
Since March 26, 2020, the number of confirmed cases of COVID-19 are estimated at the national level taking into account data from city medical biology laboratories and patients hospitalized for COVID-19 (source SI-VIC). It was estimated that between January 21 and April 14, 2020, 103,573 cases of COVID-19 were confirmed in France.
Patients with signs of COVID-19 are no longer systematically confirmed by a biological test. According to the ministerial recommendations of March 13, 2020, the taking of samples for diagnostic purposes is only recommended for certain patients and it is in particular advisable to take into account comorbidities, the profession (health professionals) and the clinical picture. .
The actual number of COVID-19 cases in France is therefore higher than the number of confirmed cases reported.
The number of confirmed cases in France therefore no longer satisfactorily reflects the dynamics of the epidemic.
Content from External Source
That's from page 4 of the cutrrent situation report at https://www.santepubliquefrance.fr/...vid-19-point-epidemiologique-du-16-avril-2020
 
Thanks. The Worldometers website does make some errors, so maybe this is just another one. Let us see what the next update looks like.

The similarity of the figures 17167 and 17164 could be just a coincidence. But I doubt that the total for cases
165027 = 108847 + 56180 is a coincidence. It does look like Worldometers have added the two figures together to get a total.
 
There's a new study out of Stanford—not yet peer reviewed—suggesting the infection fatality rate is much lower than the 1–2% range usually estimated. I'm skeptical, because the study would seem to indicate an IFR similar to that of the seasonal flu, and clearly the situation in hospitals in Italy, Spain, and New York is much graver than it would be for a flu.

The study looked for the presence of antibodies to SARS-CoV-2 in residents of Santa Clara County California, and found that the prevalence was much higher than the number of reported cases—50 to 85 times more than the prevalence of confirmed cases, which extrapolated to the entire county population would be 48,000 to 81,000 people infected. The New York Times tracker reports 70 deaths in Santa Clara County as of April 17. I don't have data for the April 4-3 when the testing was done, but using the figure from the 17th puts an upper bound on the IFR of between 0.086% and 0.14%.

Such a low IFR would be great news, but again, I think it's more likely that the study is flawed. I suspect that the serology test is generating some false positives, which would dramatically skew the numbers when the prevalence is so low. The abstract claims that the test performance was checked against a sample of 37 positive and 30 negative controls—that doesn't seem like enough people.

Another thing I wonder about: they tested 3,330 people and found 1.5% had antibodies to SARS-CoV-2, or about 50 people. It seems like a low number of positives. If you haven't recruited a truly random sample (e.g. are people with symptoms more likely to volunteer?), then you'll end up with skewed estimates.

I'd love to hear thoughts from people who have more experience with this type of study.

https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1

Here's the abstract:
Background Addressing COVID-19 is a pressing health and social concern. To date, many epidemic projections and policies addressing COVID-19 have been designed without seroprevalence data to inform epidemic parameters. We measured the seroprevalence of antibodies to SARS-CoV-2 in Santa Clara County. Methods On 4/3-4/4, 2020, we tested county residents for antibodies to SARS-CoV-2 using a lateral flow immunoassay. Participants were recruited using Facebook ads targeting a representative sample of the county by demographic and geographic characteristics. We report the prevalence of antibodies to SARS-CoV-2 in a sample of 3,330 people, adjusting for zip code, sex, and race/ethnicity. We also adjust for test performance characteristics using 3 different estimates: (i) the test manufacturer's data, (ii) a sample of 37 positive and 30 negative controls tested at Stanford, and (iii) a combination of both. Results The unadjusted prevalence of antibodies to SARS-CoV-2 in Santa Clara County was 1.5% (exact binomial 95CI 1.11-1.97%), and the population-weighted prevalence was 2.81% (95CI 2.24-3.37%). Under the three scenarios for test performance characteristics, the population prevalence of COVID-19 in Santa Clara ranged from 2.49% (95CI 1.80-3.17%) to 4.16% (2.58-5.70%). These prevalence estimates represent a range between 48,000 and 81,000 people infected in Santa Clara County by early April, 50-85-fold more than the number of confirmed cases. Conclusions The population prevalence of SARS-CoV-2 antibodies in Santa Clara County implies that the infection is much more widespread than indicated by the number of confirmed cases. Population prevalence estimates can now be used to calibrate epidemic and mortality projections.

—Chris
 
Last edited:
Just to follow up my previous comments (#168 and #170), the latest Worldometers report includes a long and complicated note about the French data:

France reported that a portion of the EHPAD and EMS nursing home cases - representing about 33% of the total EHPAD and EMS cases - were confirmed (rather than probable, as the other 67%) and as such are to be considered as already included in the total national case count. The French Government has now started reporting the breakdown between confirmed and probable EHPAD and EMS cases. We have adjusted the historical data for France from April 4 based on this information. In accordance with the decision by the French Government to include EHPAD and EMS probable deaths in the total death count. EHPAD and EMS probable cases must be included in the total case count in order to be logically sound (a death implies a case) and methodologically consistent. On April 3, the French Government had reported 17,827 additional cases and 532 additional deaths from nursing homes that had not been reported previously. On April 2, it had reported 884 additional deaths

I won't attempt to interpret this, beyond making the obvious point that yesterday's apparent large increase in 'cases' for France was not a genuine one-day increase.
 
The study looked for the presence of antibodies to SARS-CoV-2 in residents of Santa Clara County California, and found that the prevalence was much higher than the number of reported cases—50 to 85 times more than the prevalence of confirmed cases
Another thing I wonder about: they tested 3,330 people and found 1.5% had antibodies to SARS-CoV-2, or about 50 people. It seems like a low number of positives. If you haven't recruited a truly random sample (e.g. are people with symptoms more likely to volunteer?), then you'll end up with skewed estimates.
It has Prof. Ioannidis name on it (the guy who thought 7 deaths on the Diamond princess was too low when it's 12 derths now), and it's rubbish, and they know it. (You can downlod a statistical appendix to the paper, but no shared data.)

This study had several limitations. First, our sampling strategy selected for members of Santa Clara County with access to Facebook and a car to attend drive-through testing sites. This resulted in an overrepresentation of white women between the ages of 19 and 64, and an under-representation of Hispanic and Asian populations, relative to our community. Those imbalances were partly addressed by weighting our sample population by zip code, race, and sex to match the county. We did not account for age imbalance in our sample, and could not ascertain representativeness of SARS-CoV-2 antibodies in homeless populations. Other biases, such as bias favoring individuals in good health capable of attending our testing sites, or bias favoring those with prior COVID-like illnesses seeking antibody confirmation are also possible. The overall effect of such biases is hard to ascertain.
Content from External Source
First, he picked the county that had the earliest cases in California and had the outbreak the first, ensuring that the population would be undertested. This means that it's likely that every other county in California has fewer unregistered infections than Santa Clara.

Second, study participants were people who responded to a facebook ad. This is a self-selected sample, and this property completely kills the usefulness of the study all by itself. This is a beginner's error! People who think they had Covid-19 and didn't get tested or know someone who did are much more likely to respond to such an ad than people who did not. (By comparison, the Gangelt study contacted 600 carefully chosen households per mail, and 400 responded. Still somewhat self-selected, but not as badly.)

Third, age is the one most common predictor of mortality. He did not weigh the results by age, and old people are underrepresented in the study. Anything he says about mortality is completely useless if we don't know how prevalent the infection was in the older population. (In Germany, cases show that the prevalence among tested older people was low initially and took a few weeks to rise.)

Fourth, instead he weighs prevalence by zip code--why? This exacerbates statistical variations, since there were only 50 positive results, and Santa Clara has ~60 zip codes. If you have a positive result fall on a populous zip code by chance where only a few participants participated, then the numbers are skewed up. They must have seen this happen because their estimated prevalence is almost twice as high as the raw prevalence.

Fifth, the specificity of the test is "99.5% (95 CI 98.3-99.9%)". This means that theoretically, if the specificity was 98.5%, all of the 50 positive results could be false positives, and nobody in the sample would have had any Covid-19. This means the result is not statistically significant even if the sample had been well chosen (which it wasn't). (It's not even significant at the 90% level.)

Sixth, they used a notoriously inaccurate "lateral flow assay" instead of an ELISA test and did not validate their positive samples (only 50) with a more sensitive test -- why not?

Seventh, The Covid-19-antibody test can create false positives if it cross-reacts with other human coronavirus antibodies, i.e. if you test the samples of people who had a cold, your speficity will suffer. Therefore, a manufacturer could a) test blood donor samples, they not allowed to give blood if they have been sick shortly before; b) test samples taken in the summer when people are less likely to have colds than in March.

To state the previous three points this in another way, a large number of positive results (a third if the specificy is actually 99.5%, but probably more than that) are fake, and depending on which zip codes they randomly fall in, they could considerably skew the results.

I'll put some more quotes from the study in the spoiler.
I haven't marked elisions. If you see a blank line, assume I cut something there.
(iii) PCR-based testing rates have been highly variable across contexts and over time, leading to noisy relationships between the number of cases and infections.

At the time of this study, Santa Clara County had the largest number of confirmed cases of any county in Northern California (1,094). The county also had several of the earliest known cases of COVID-19 in the state - including one of the first presumed cases of community-acquired disease - making it an especially appropriate location to test a population-level sample for the presence of active and past infections.
On April 3rd and 4th, 2020 we conducted a survey of residents of Santa Clara County to measure the seroprevalence of antibodies to SARS-CoV-2 and better approximate the number of infections.

Methods
We conducted serologic testing for SARS-CoV-2 antibodies in 3,330 adults and children in Santa Clara County using capillary blood draws and a lateral flow immunoassay.

We recruited participants by placing targeted advertisements on Facebook aimed at residents of Santa Clara County. We used Facebook to quickly reach a large number of county residents and because it allows for granular targeting by zip code and sociodemographic characteristics. We used a combination of two targeting strategies: ads aimed at a representative population of the county by zip code, and specially targeted ads to balance our sample for under-represented zip codes. In addition, we capped registrations from overrepresented areas. Individuals who clicked on the advertisement were directed to a survey hosted by the Stanford REDcap platform, which provided information about the study. The survey asked for six data elements: zip code of residence, age, sex, race/ethnicity, underlying comorbidities, and prior clinical symptoms. Over 24 hours, we registered 3,285 adults, and each adult was allowed to bring one child from the same household with them (889 children registered).

We established drive-through test sites in three locations spaced across Santa Clara County: two county parks in Los Gatos and San Jose, and a church in Mountain View.

Test Kit Performance
The manufacturer’s performance characteristics were available prior to the study (using 85 confirmed positive and 371 confirmed negative samples). We conducted additional testing to assess the kit performance using local specimens. We tested the kits using sera from 37 RT-PCR-positive patients at Stanford Hospital that were also IgG and/or IgM-positive on a locally developed ELISA assay. We also tested the kits on 30 pre-COVID samples from Stanford Hospital to derive an independent measure of specificity. Our procedure for using these data is detailed below.

Third, we adjusted the prevalence for test sensitivity and specificity. Because SARS-CoV-2 lateral flow assays are new, we applied three scenarios of test kit sensitivity and specificity. The first scenario uses the manufacturer’s validation data (S1). The second scenario uses sensitivity and specificity from a sample of 37 known positive (RT-PCR-positive and IgG or IgM positive on a locally-developed ELISA) and 30 known pre-COVID negatives tested on the kit at Stanford (S2). The third scenario combines the two collections of samples (manufacturer and local sample) as a single pooled sample (S3). We use the delta method to estimate standard errors for the population prevalence, which accounts for sampling error and propagates the uncertainty in the sensitivity and specificity in each scenario. A more detailed version of the formulas we use in our calculations is available in the Appendix to this paper.

The test kit used in this study (Premier Biotech, Minneapolis, MN) was tested in a Stanford laboratory prior to field deployment. Among 37 samples of known PCR-positive COVID-19 patients with positive IgG or IgM detected on a locally-developed ELISA test, 25 were kit-positive. A sample of 30 pre-COVID samples from hip surgery patients were also tested, and all 30 were negative. The manufacturer’s test characteristics relied on samples from clinically confirmed COVID-19 patients as positive gold standard and pre-COVID sera for negative gold standard. Among 75 samples of clinically confirmed COVID-19 patients with positive IgG, 75 were kit-positive, and among 85 samples with positive IgM, 78 were kit-positive. Among 371 pre-COVID samples, 369 were negative. Our estimates of sensitivity based on the manufacturer’s and locally tested data were 91.8% (using the lower estimate based on IgM, 95 CI 83.8-96.6%) and 67.6% (95 CI 50.2-82.0%), respectively. Similarly, our estimates of specificity are 99.5% (95 CI 98.1-99.9%) and 100% (95 CI 90.5-100%). A combination of both data sources provides us with a combined sensitivity of 80.3% (95 CI 72.1-87.0%) and a specificity of 99.5% (95 CI 98.3-99.9%).

The total number of positive cases by either IgG or IgM in our unadjusted sample was 50, a crude prevalence rate of 1.50% (exact binomial 95% CI 1.11-1.97%). After weighting our sample to match Santa Clara County by zip, race, and sex, the prevalence was 2.81% (95% CI 2.24-3.37 without clustering the standard errors for members of the same household, and 1.45-4.16 with clustering). We further improved our estimation using the available data on test kit sensitivity and specificity, using the three scenarios noted above. The estimated prevalence was 2.49% (95CI 1.80%-3.17%) under the S1 scenario, 4.16% (95CI 2.58%-5.70%) under the S2 scenario, and 2.75% (95CI 2.01%-3.49%) under the S3 scenario. Notably, the uncertainty bounds around each of these population prevalence estimates propagates the uncertainty in each of the three component parameters: sample prevalence, test sensitivity, and test specificity.

For example, if new estimates indicate test specificity to be less than 97.9%, our SARS-CoV-2 prevalence estimate would change from 2.8% to less than 1%, and the lower uncertainty bound of our estimate would include zero.

As of April 10, 2020, 50 people have died of COVID-19 in the County, with an average increase of 6% daily in the number of deaths. If our estimates of 48,000-81,000 infections represent the cumulative total on April 1, and we project deaths to April 22 (a 3 week lag from time of infection to death), we estimate about 100 deaths in the county.


This study had several limitations. First, our sampling strategy selected for members of Santa Clara County with access to Facebook and a car to attend drive-through testing sites. This resulted in an overrepresentation of white women between the ages of 19 and 64, and an under-representation of Hispanic and Asian populations, relative to our community. Those imbalances were partly addressed by weighting our sample population by zip code, race, and sex to match the county. We did not account for age imbalance in our sample, and could not ascertain representativeness of SARS-CoV-2 antibodies in homeless populations. Other biases, such as bias favoring individuals in good health capable of attending our testing sites, or bias favoring those with prior COVID-like illnesses seeking antibody confirmation are also possible. The overall effect of such biases is hard to ascertain.
Santa Clara demographics.png
Santa Clara participants by county.png
Santa Clara samples by county.png

Appendix
Santa Clara solve.png
There is one important caveat to this formula: it only holds as long as (one minus) the specificity of the test is higher than the sample prevalence. If it is lower, all the observed positives in the sample could be due to false-positive test results, and we cannot exclude zero prevalence as a possibility. As long as the specificity is high relative to the sample prevalence, this expression allows us to recover population prevalence from sample prevalence, despite using a noisy test.
Content from External Source
Santa Clara County population is 1,943,411
1,094 confimed cases on April 2nd or 3rd (can someone verify? and give data for the week?)

1.5% crude positive rate of 3330 participants = 50 tests positive
Santa Clara has about 60 zip codes. Note the low number of samples per zip code, and the low number of participants for some.

When March started, Santa Clara County was the epicenter of the coronavirus crisis in California.
Content from External Source
https://www.dailynews.com/2020/04/0...us-cases-in-californias-hardest-hit-counties/
This spage has a SVG animation that shows this with an animated bar chart.

Santa Clara testing dashboard-SC-CO.jpeg
https://www.kron4.com/news/bay-area...-dashboards-with-latest-covid-19-information/
As you can see, the early weeks have next to no negative tests, so many cases are unrecognized and free to spread in the community.

I tried to check the formula by plugging the result into the bigger equation above, but I couldn't make it work. I'm suspicious that they roll their own maths here instead of simply inserting a textbook formula (with source).
Santa Clara solve.png
The formula says that
P(Covid+) = [ P(Test+) - P(Test+|Covid-) ] / [ P(Test-|Covid-) - P(Test-|Covid+) ]
and I don't understand why that would make sense. The numerator is close to the incidence of true positives if the number of negative samples is near 1, but the denominator says nothing meaningful? It should indicate a factor by which we underestimate the result because we are missing the false negatives.
Mmmh, I pluigged in some numbers and it looks sane, at least.
 

Attachments

  • Santa Clara 2020.04.14.20062463v1.full.pdf
    1.5 MB · Views: 400
  • Santa Clara Statistics 2020.04.14.20062463-1.pdf
    131.6 KB · Views: 375
I won't attempt to interpret this, beyond making the obvious point that yesterday's apparent large increase in 'cases' for France was not a genuine one-day increase.
The problem (as indicated by my quote) is that the French officials are using data from different sources. The total number of confirmed cases is probably something they get from the laboratories doing the tests. The umber of cases in ESMS is probably something the ESMS reports. Because they sent samples from 33% of their cases off to be tested, there is overlap between the two data sources. Your comment suggests that on April 3rd, a group of nursing homes reported cases to the government who had not previously done so, and thus the statistics take a leap to catch up. What worldometer should do is take the "total confirmed" and add 66% of the ESMS number to get a case count without overlap; that case count will still be missing the infections in the general population that are never confirmed, but that's obviously true everywhere.

Germany counts only confirmed cases (but we can confirm all suspected cases with symptoms, so an "unconfirmed" diagnosis always gets confirmed). Every physician has to report any patient with an infectious disease that is potentially epidemic to their county health offical, and from there on upwards it gets accumulated through the system and finally reaches the RKI. This means the RKI have access to age, gender, date of onset of symptoms, date of confirmation, and date of death, and cause of death as determined by the attending physician who examines the corpse ("by covid19"/"with covid19"). In many cases, they also have information on symptoms, hospitalization status, etc. This method sometimes leads to lag in the official numbers, but they don't have any overlap and are reasonably complete. They can actually do a reasonably accurate graph of cases by symptom onset from those data.
The RKI has recently arranged to get data directly from the laboratories in order to get a less laggy overview of case numbers, but the thorough system still runs in parallel.
 
Interesting comparison of cumulative deaths per 100k population. Case counts depend on the amount of testing, but you can select those too in the drop-down menu.
If you use the "per 100k" data, lining it up at the 50th case is misleading; that should be lined up at equal per population values, and that would shift the US to the left and the Netherlands to the right, for example.
How did you determine that case counts depend on the amount of testing? Or did you mean that the accuracy of the case counts depends on that?
(What I really want to see is "new cases" or new deaths, not the cumulative graph; it's not as easy to notice trends there.)
 
If you use the "per 100k" data, lining it up at the 50th case is misleading; that should be lined up at equal per population values, and that would shift the US to the left and the Netherlands to the right, for example.
How did you determine that case counts depend on the amount of testing? Or did you mean that the accuracy of the case counts depends on that?
(What I really want to see is "new cases" or new deaths, not the cumulative graph; it's not as easy to notice trends there.)

Their default setting is number of confirmed cases, so it starts at the 50th case. I can't change it.
The more you test, the more confirmed cases you'll catch. If you don't test at all, you won't have any confirmed cases.
They have a graph of new cases, not deaths, and not normalized by population.
https://coronavirus.jhu.edu/data/new-cases
1587170431109.png
 
The more you test, the more confirmed cases you'll catch. If you don't test at all, you won't have any confirmed cases.
"The more water you pour in the bucket, the more water will be in the bucket" stops being true when the bucket runs over.
For testing, the relationship is definitely not linear.
The case rate is driven by two entities: A, the number of people selected for testing, and B, the number of tests. If A>B, then increasing the test volume will yield cases linearly as long as A remains unchanged. But the criteria for selection of A change: while at first, you may only test people with unknown pneumonia (high chance to find cases), you might progress to do contact tracing (~10% chance), and then maybe health care workers, and then screen people newly admitted to returement homes. Each new extension of A creates more demand for B, but it lowers the rate at which additional tests find new cases. In a B>A situation, increasing the test volume alone won't achieve much, and in fact the number of tests performed is actually driven by an increase in the size of A: we perform more tests because we find more cases to trace! Not the testing drives the case numbers, but the case numbers drive the testing then!

So that's why I am saying that the amount of testing determines how accurate the case numbers are. If you only test people with pneumonia, you'll not have an accurate measurement of infections. But you'll have an accurate measurement of people having severe cases of Covid-19, and it's probably going to be proportional to the number of infections, and tells you how many hospital beds you need and are projected to need. If you do contact tracing, your number becomes more accurate, i.e. the number of cases gets closer to the number of infected. It gets more accurate. But it does that because A and B change.
 
There's a new study out of Stanford—not yet peer reviewed—suggesting the infection fatality rate is much lower than the 1–2% range usually estimated. I'm skeptical, because the study would seem to indicate an IFR similar to that of the seasonal flu, and clearly the situation in hospitals in Italy, Spain, and New York is much graver than it would be for a flu.

I'd love to hear thoughts from people who have more experience with this type of study.
There is a News article on the Nature website, addressing this preprint:

NEWS 17 APRIL 2020

Antibody tests suggest that coronavirus infections vastly exceed official counts
Study estimates a more than 50-fold increase in coronavirus infections compared to official cases, but experts have raised concerns about the reliability of antibody kits.

Test concerns
Some scientists have raised concerns about the reliability of the antibody tests used in these surveys, another factor that could affect the accuracy of the survey results. Tests that produce false positives could inflate infection rate estimates

Bhattacharya [Lead Author] says the results probably undercount the prevalence in the wider population, because they miss anyone who has been infected too recently to have mounted an immune response, and exclude people in prisons, nursing homes and other institutional settings.

Results are expected soon from sero-prevalence surveys run by other groups around the world, including teams in China, Australia, Iceland, Italy, Germany and several others in the United States.

This story will be updated throughout the day.
Content from External Source
https://www.nature.com/articles/d41586-020-01095-0
 
Last edited:
@chrisl @Trailspotter
The problems with Stanford's Santa Clara Study (this; https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1 ) are even worse than I thought. In post #173, my analysis stated that the sample was self-selected and hence likely to be very skewed, and that the results were not statistically significant given the numbers. I've copied this analysis to the discussion on medrXiv and am happy to have seen positive feedback, as well as other commenters making similar observations. (It was also flagged as spam for half a day.) Now another commenter found the information for the test kit they used:
I am sorry to say that the basic assumptions and math of this research are wrong.
The researchers quoted the kit performance provided by the manufactures, but quoted the wrong information, twice.
The correct information appears here: http://en.biotests.com.cn/newsitem/278470281
Content from External Source
https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1#comment-4879825364

This isn't the exact name of the manufacturer listed in the study, but the performance data match:
Santa Clara test stats.png
(from http://en.biotests.com.cn/newsitem/278470281)
Among 75 samples of clinically confirmed COVID-19 patients with positive IgG, 75 were kit-positive, and among 85 samples with positive IgM, 78 were kit-positive. Among 371 pre-COVID samples, 369 were negative.
Content from External Source
(from the study)
Now comes the crux of the matter, citing the study:
The total number of positive cases by either IgG or IgM in our unadjusted sample was 50, a crude prevalence rate of 1.50% (exact binomial 95% CI 1.11-1.97%).
Content from External Source
If you apply this criterium to the manufacturer data (case is positive if either IgG or IgM is positive), then we have at least 3 and likely even 5 false positives, since we'd need to use the higher numbner of 2 and 3 if there is overlap, and add them if there is none. 5/372 gives a false positive rate of 1.35% (95% CI 0.44%-3.49%), if I reduce the probability to flipping a coin, I get 50% Confidence interval of 0.91%-1.99%.
There's a >50% chance that this result is completely random.

Since the 75/75 hit rate for the positive tests is 100%, we can now use the formula from the appendix:
Santa Clara formula.png
P(Covid+) = [ P(Test+) - P(Test+|Covid-) ] / [ P(Test-|Covid-) - P(Test-|Covid+) ]
P = ( 1.5% - 1.35% ) / ( 98.65% - 0% ) = 0.15/98.65 = 0.00152 = 0.15% = 152/100,000.
This translates to 2955 infections for Santa Clara county if we don't do any weighing, possibly two-three times that with weighing (that's the size of their weighing adjustment).
The reported number of confirmed positive cases in the county on April 1 was 956, 50-85-fold lower than the number of infections predicted by this study.
Content from External Source
The study data actually only supports a 3 to 9-fold higher number of infections than confirmed cases. With 100 deaths that this study assumes, we get an infection fatality rate (IFR) of 3.4% (2955 infections) - 1.2% (9000 infections). 3.4% is actually close to the number that came out of China, and with 12 deaths reported by March 24 and 712 infections on the Diamond Princess, that IFR was 1.7% unadjusted.

The study doesn't deserve the attention it is getting.
Can we move the posts concerning this to a separate topic? Because I feel that this is a proper debunk now.
 
Last edited:
New York state reports 18,298 deaths, and has 19.45 million residents. If every single resident had been infected with COVID-19, the IFR would be 0.094%. If one in ten New Yorkers were already infected, the IFR is 0.94%.
 
New York state reports 18,298 deaths, and has 19.45 million residents. If every single resident had been infected with COVID-19, the IFR would be 0.094%. If one in ten New Yorkers were already infected, the IFR is 0.94%.
New York isn't Santa Clara, but with 230 000 cases and a 10% prevalence (your 1 in ten infected) we'd be close to that 9-fold underestimate of infections.
I have no idea what NY's testing strategy and coverage is right now, but if they only test severe cases, they are missing out on about a 3-fold number of mild cases, and a similar number of asymptomatic cases, so by my reckoning, that would be similar to the magnitude I'd expect to see.
Obviously, that's very rough reckoning.

Your IFR is the "naive" IFR, because many people who are infected now haven't died yet. My personal feeling is that the final IFR might be around 1.5% , but who knows.
 
Let's look at excess mortality. That number is the gold standard for epidemics, and the proper rebuttal to people who say "it's just like the flu" and "these people would have died anyway". We can easily see a lot of excess mortality by looking at hard-hit regions like Lombardy or Wuhan, but it does show up at country level for some countries who have had major outbreaks.

To understand the graphs that follow, I need to explain the z-score. The z-score is a way to normalize a statistical value by relating it to the statistical average and the standard deviation sigma.
image.png
https://en.m.wikipedia.org/wiki/Standard_score
As you can see, 95% of random data points have z-scores between -2 and +2, and values with higher z-scores are quite rarely caused by random chance.

Mortality monitoring in Europe

Welcome to the EuroMOMO website. We publish weekly bulletins of the all-cause mortality levels in up to 24 European countries or regions of countries. The weekly bulletin is published every Thursday around noon.

European mortality bulletin week 15, 2020

Pooled mortality estimates from the EuroMOMO network continue to show a marked increase in excess all-cause mortality overall for the participating European countries, coinciding with the current COVID-19 global pandemic. This overall excess mortality is, however, driven by a very substantial excess mortality in some countries, primarily seen in the age group of 65 years and above, but also in the age group of 15-64 years.
Data from 24 participating countries or regions were included in this week’s pooled analysis of all-cause mortality. [..]

image.png
Content from External Source
http://www.euromomo.eu/index.html

First, Germany looks a bit weird because only two of our 16 states are participating. It's obvious that the countries that have been on the news a lot have significant excess mortality. This is also borne out by the graphs:
image.png
Note the different scales of the y-axes: they always go from -4 to 8, but often the lines will be closer together to accomodate a graph exceeding z=8 by a large margin. The graphs do show the excess mortality caused by the yearly flu season for the previous 3 years, with 2018/19 being mild almost everywhere, and 2019/20 as well until the blue Covid spike well into 2020.
I wonder how this is going to develop in the coming weeks.
 
Last edited:
COVID-19 Antibody Seroprevalence in Santa Clara County, California
https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1
The test kit used in this study (Premier Biotech, Minneapolis, MN) was tested in a Stanford laboratory prior to field deployment. Among 37 samples of known PCR-positive COVID-19 patients with positive IgG or IgM detected on a locally-developed ELISA test, 25 were kit-positive. A sample of 30 pre-COVID samples from hip surgery patients were also tested, and all 30 were negative. The manufacturer’s test characteristics relied on samples from clinically confirmed COVID-19 patients as positive gold standard and pre-COVID sera for negative gold standard. Among 75 samples of clinically confirmed COVID-19 patients with positive IgG, 75 were kit-positive, and among 85 samples with positive IgM, 78 were kit-positive. Among 371 pre-COVID samples, 369 were negative. Our estimates of sensitivity based on the manufacturer’s and locally tested data were 91.8% (using the lower estimate based on IgM, 95 CI 83.8-96.6%) and 67.6% (95 CI 50.2-82.0%), respectively. Similarly, our estimates of specificity are 99.5% (95 CI 98.1-99.9%) and 100% (95 CI 90.5-100%). A combination of both data sources provides us with a combined sensitivity of 80.3% (95 CI 72.1-87.0%) and a specificity of 99.5% (95 CI 98.3-99.9%).
Content from External Source
The total number of positive cases by either IgG or IgM in our unadjusted sample was 50, a crude prevalence rate of 1.50% (exact binomial 95% CI 1.11-1.97%).
Content from External Source
From the statistical appendix:
In the first scenario, we estimate these quantities based upon the numbers provided by the
manufacturer of the test kits. For sensitivity, the manufacturer reported 78 positive test readings
for 85 samples (from Chinese blood samples) known to have specific IgM antibodies to the
receptor-binding domain (RBD) spike on the SARS-nCOV2 virus. They reported 75 positive test
readings for 75 of the samples with specific IgG antibodies to the same RBD spike. We adopt a
conservative estimate of sensitivity equal to r = 78/85 ≈ 91.8%. The manufacturer reports
specificity based on an experiment using their kit on a sample of 371 known negative blood
samples collected from before the epidemic, and 369 were tested negative. This implies a
specificity of s = 369/371 ≈ 99.5%.
Content from External Source
The study reports only the IgG specificity of the test kit, omitting the false error rate for IgM. It also sets the sensitivity at the lower of both values. This is mathematically consistent with accepting a sample as positive if it passes both the IgG and the IgM test. However, the package insert states that a test is positive if either one of these is detected, it doesn't require both.
Santa Clara test positive.png
What did the study do? Page 6 states: "The total number of positive cases by either IgG or IgM in our unadjusted sample was 50". It comes down to a point of grammar: does "positive by either" mean it had to register positive by both?

This is a rather crucial point: if they counted a test as positive if just one of IgG and IgM was positive, the mathematical analysis is invalid and needs to be redone.
 

Attachments

  • Santa Clara COVID19_Package_Insert_Rapid.pdf
    161.8 KB · Views: 380
COVID-19 Antibody Seroprevalence in Santa Clara County, California
https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1

What did the study do? Page 6 states: "The total number of positive cases by either IgG or IgM in our unadjusted sample was 50". It comes down to a point of grammar: does "positive by either" mean it had to register positive by both?

This is a rather crucial point: if they counted a test as positive if just one of IgG and IgM was positive, the mathematical analysis is invalid and needs to be redone.

Thanks for doing a deeper dive on this study. I look forward to seeing what sort of responses the authors offer to questions about their sampling and the real specificity of the the serology test.

There are a lot of coronavirus contrarians out there using this study as evidence for their 'just a flu' viewpoints. I'd love for them to be right. But when the IFR inferred from the 50 positives in this study conflicts with the actual body counts in NYC, Lombardy, etc., I'm going to bet that the study is wrong. The Santa Clara study also seems to disagree with the results of broad testing in Iceland.

—Chris
 
And just after I finished my last message, there's another serology study from California that reports similarly high infection rates as the Santa Clara County study. This one is from USC and the Los Angeles County Department of Public Health.

I found the press release, but no paper:
http://publichealth.lacounty.gov/phcommon/public/media/mediapubhpdetail.cfm?prid=2328

Based on results of the first round of testing, the research team estimates that approximately 4.1% of the county's adult population has antibody to the virus. Adjusting this estimate for statistical margin of error implies about 2.8% to 5.6% of the county's adult population has antibody to the virus- which translates to approximately 221,000 to 442,000 adults in the county who have had the infection. That estimate is 28 to 55 times higher than the 7,994 confirmed cases of COVID-19 reported to the county by the time of the study in early April. The number of COVID-related deaths in the county has now surpassed 600.

A few more details can be found in a Q&A with USC researcher Neeraj Sood, who led the study:

https://pressroom.usc.edu/what-a-usc-la-county-antibody-study-can-teach-us-about-covid-19/

I remain skeptical for the same reasons as before. It sounds like they're using the same test as the Stanford group, based on this answer from the linked Q&A:

How reliable are the antibody tests?
Premier Biotech, the manufacturer of the test that USC and L.A. County are using, tested blood from COVID-19-positive patients with a 90 to 95% accuracy rate. The company also tested 371 COVID-19-negative patients, with only two false positives. We also validated these tests in a small sample at a lab at Stanford University. When we do our analysis, we will also adjust for false positives and false negatives.

—Chris
 
Thanks for doing a deeper dive on this study. I look forward to seeing what sort of responses the authors offer to questions about their sampling and the real specificity of the the serology test.
If you see any, let me know. I bet this is going to vanish in a puff of smoke once more robust studies come out. The WHO director of research said in the press conference today that the WHO is supporting some more robust studies across the world, and she indicated that preliminary data looks like the percentage of undiscovered cases would be rather smaller. We'll have to wait and see. Meanwhile, there are 200 comments on this study on medarxiv. :p

Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted data from the outbreak on the Diamond Princess cruise ship, February 2020
Passengers of the Diamond Princess were initially to be held in quarantine for 14 days until 17 February. However, those who had intense exposure to the confirmed case-patient, such as sharing a cabin, were held in quarantine beyond the initial 14-day window [2]. According to reference [2], by 20 February, there were 619 confirmed cases on-board (17%), 318 of them were asymptomatic (asymptomatic cases were either self-assessed to be symptomless or tested positive before symptom onset) and 301 were symptomatic [2]. Overall 3,063 PCR tests were performed among passengers and crew members.
Content from External Source
https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2020.25.12.2000256

With only about half the positives being asymptomatic, that doesn't really leave a lot of leeway for undiscovered cases in a community with good access to testing; even if you estimate that asymptomatic cases are postive for a week and symptomatic cases for two before they have to go to intensive care, that's just 2 to 1, maybe 3 to 1 if we allow that they didn't test everyone all at once.

And the CFRs in that study that are listed in table 2 really don't need any adjustment when you add the next 5 deaths (reported e.g. on wikipedia) to it.
 
And just after I finished my last message, there's another serology study from California that reports similarly high infection rates as the Santa Clara County study. This one is from USC and the Los Angeles County Department of Public Health.

I found the press release, but no paper:
http://publichealth.lacounty.gov/phcommon/public/media/mediapubhpdetail.cfm?prid=2328
Participants were recruited via a proprietary database that is representative of the county population. The database is maintained by LRW Group, a market research firm.
Content from External Source
That's promising, at least. Since they took the samples on April 10th and 11th, the paper ought to not yet be out if they're doing a throrough job? It does look like the same test:
LAcounty m6kJ88gQ.jpeg
A few more details can be found in a Q&A with USC researcher Neeraj Sood, who led the study:
https://pressroom.usc.edu/what-a-usc-la-county-antibody-study-can-teach-us-about-covid-19/
Participants for the USC-L.A. County study were recruited by the market services firm LRW Group using a large proprietary database that ensures factors such as age, race and sex are part of the random selection. For the first testing that took place on April 10 and 11, USC and the L.A. County Department of Public Health identified six sites for drive-thru testing. Our plan moving forward is to test a different group of 1,000 randomly selected people every several weeks.
Content from External Source
Based on the published variance of 2.8-5.6%, I'm guessing they had 35 positive tests of 850 valid samples, but that's probably wrong. I'm hoping they validate these positives with a different test, that'll reveal a better false positive rate.
 
With only about half the positives being asymptomatic, that doesn't really leave a lot of leeway for undiscovered cases in a community with good access to testing; even if you estimate that asymptomatic cases are positive for a week and symptomatic cases for two before they have to go to intensive care, that's just 2 to 1, maybe 3 to 1 if we allow that they didn't test everyone all at once.

Only 18% of the positives remained asymptomatic; the rest developed symptoms. 13 died and 7 remain serious or critical, and the longer they're critical the less chance they'll recover.
https://www.worldometers.info/coronavirus
 
I have a question... regarding the Santa Clara County study specifically.

-They're using a test which makes the issue of false positives and false negatives a bad problem.
-Increasing N over 1000 has surprisingly little effect on the margin of error. (This being more true with simpler statistical methods and less true with more complex methods.)

Question: Would it have been better to test all samples twice with the same type of test? Throw out all positives that were not duplicated. Without increasing the total number of tests used (and the dollar cost), N would still be over 1000, while error from false positives and false negatives would be much reduced.

(I'm still thinking about this... I think this is a different method than double checking all positives on a post hoc basis. I think my method controls for false negatives as well as false positives.)

The problem with poor randomization would still be there.
 
Last edited:
I have a question... regarding the Santa Clara County study specifically.

-They're using a test which makes the issue of false positives and false negatives a bad problem.
-Increasing N over 1000 has surprisingly little effect on the margin of error. (This being more true with simpler statistical methods and less true with more complex methods.)

Question: Would it have been better to test all samples twice with the same type of test? Throw out all positives that were not duplicated. Without increasing the total number of tests used (and the dollar cost), N would still be over 1000, while error from false positives and false negatives would be much reduced.

(I'm still thinking about this... I think this is a different method than double checking all positives on a post hoc basis. I think my method controls for false negatives as well as false positives.)

The problem with poor randomization would still be there.

Simple probability tells us that if the probability of a false positive is 1% (p=0.01), then the probability of testing someone twice and getting a false positive twice is just 0.01*0.01 = 0.0001, or 0.01%. This is highly unlikely, so it would seem to be a good filter for weeding out false positives. Of course, if the probability of actually having the virus is low enough, then 0.01% might still be higher than the odds of actually having the virus (P(having virus, given two positive results) < P(not having virus, given two positive results)), but I don't think that's the case here.

The above is only true if the repeated tests can be considered independent. If for some reason they can't, for example if something in the person's sample is more likely to trigger a false positive than someone else's sample, then it becomes trickier. All that being said, I did see some commentary asking why the people who tested positive, which was a relatively small amount, weren't re-tested, either the same way or using a more robust testing mechanism.
 
Increasing N over 1000 has surprisingly little effect on the margin of error.
I think they did that mostly so they could do their weighing. Which didn't fix the age group issues, and probably obscured the error: the 1.5% raw rate is within their 95% Confidence Interval for the false positive rate, but once they've blown the probabilities up with the weighing, their variance range no longer includes 0%. The fact that they rolled their own statistics instead of using a textbook procedure is suspicious to me.
The above is only true if the repeated tests can be considered independent. If for some reason they can't, for example if something in the person's sample is more likely to trigger a false positive than someone else's sample, then it becomes trickier. All that being said, I did see some commentary asking why the people who tested positive, which was a relatively small amount, weren't re-tested, either the same way or using a more robust testing mechanism.
If you use the same test, it's not independent. The package insert that I attached to post #184 states clearly that the test can cross-react with "common cold" coronaviruses; if you have this cross-reactivity in a sample, a re-test isn't going to help.
A better type of test is a neutralisation test, where you check in the lab whether the sampled blood can actually attack the virus and keep it from growing in tissue. I think you could also use an antibody test that uses a different RNA section of the antigen for detection, but I'm not entirely clear on this.

Only 18% of the positives remained asymptomatic;
I'm not doubting you, but do you have a source for that?
I'm surprised that worldometers still lists 55 active cases, I'd have thought they'd be all closed now, but maybe these are crew?
 
Hi all could i ask for a summary of this thread please, its a few pages now and data heavy and im not that wise or gifted to grasp what has been shown..

regards D
 
Oh if this is relevant i think may help as it seems the Australian flu season appears to have been headed off by the hand-washing, quarantine and social distancing measures designed to control COVID-19.



Flu season that looked like 'a big one' beaten by hygiene, isolation

https://www.smh.com.au/national/flu...ten-by-hygiene-isolation-20200420-p54lh7.html


A potentially huge flu season appears to have been headed off by the hand-washing, quarantine and social distancing measures designed to control COVID-19.

Confirmed cases of influenza dropped from 7002 in February to just 95 in April so far as the government’s measures to slow the spread of COVID-19 kicked in.
Content from External Source
 
Okay, weighting... so they want a large value for N... so they can chop it up.
I'm suspicious of weighting... rightly or wrongly.

Same test is not independent. Yes, point taken.

I'm laboring under some handicaps.

-I'm really rusty... 28 years since last statistics class.

-I'm trying to re-educate myself about specificity and selectivity... instead of going to bed as I should be doing. I'm too sleepy for this stuff!

But I've run across something in Wikipedia that seems to be apropos to this over-tried mind.
https://en.wikipedia.org/wiki/Base_rate_fallacy#False_positive_paradox
False positive paradox[edit]
One type of base rate fallacy is the false positive paradox, where false positive tests are more probable than true positive tests, occurring when the overall population has a low incidence of a condition and the incidence rate is lower than the false positive rate. The probability of a positive test result is determined not only by the accuracy of the test but by the characteristics of the sampled population.[2] When the incidence, the proportion of those who have a given condition, is lower than the test's false positive rate, even tests that have a very low chance of giving a false positive in an individual case will give more false than true positives overall.[3] So, in a society with very few infected people—fewer proportionately than the test gives false positives—there will actually be more who test positive for a disease incorrectly and don't have it than those who test positive accurately and do. The paradox has surprised many.[4]


Example
A group of police officers have breathalyzers displaying false drunkenness in 5% of the cases in which the driver is sober. However, the breathalyzers never fail to detect a truly drunk person. One in a thousand drivers is driving drunk. Suppose the police officers then stop a driver at random to administer a breathalyzer test. It indicates that the driver is drunk. We assume you don't know anything else about him or her. How high is the probability he or she really is drunk?
Many would answer as high as 95%, but the correct probability is about 2%.

An explanation for this is as follows: on average, for every 1,000 drivers tested,

  • 1 driver is drunk, and it is 100% certain that for that driver there is a true positive test result, so there is 1 true positive test result
  • 999 drivers are not drunk, and among those drivers there are 5% false positive test results, so there are 49.95 false positive test results
Therefore, the probability that one of the drivers among the 1 + 49.95 = 50.95 positive test results really is drunk is {\displaystyle 1/50.95\approx 0.019627}{\displaystyle 1/50.95\approx 0.019627}.

The validity of this result does, however, hinge on the validity of the initial assumption that the police officer stopped the driver truly at random, and not because of bad driving. If that or another non-arbitrary reason for stopping the driver was present, then the calculation also involves the probability of a drunk driver driving competently and a non-drunk driver driving (in-)competently.
 
Last edited:
Hi all could i ask for a summary of this thread please, its a few pages now and data heavy and im not that wise or gifted to grasp what has been shown..

regards D

We don't know anything even remotely concrete, because the virus is too new and no one has collected nearly enough data yet.
 
We don't know anything even remotely concrete, because the virus is too new and no one has collected nearly enough data yet.


ta agree D but was hoping to just get the jist save me some reading and data comprehension time & perhaps it be some help other MB site readers who are following,,, but im happy to wait watch as things develop i can gather digest the details slowly..
 
ta agree D but was hoping to just get the jist save me some reading and data comprehension time & perhaps it be some help other MB site readers who are following,,, but im happy to wait watch as things develop i can gather digest the details slowly..

it really is the jist. the one or two locales that have done more general population testing still haven't done enough. and there are simply so many variables.
we need a few months until there is an adequately reliable FDA approved antibody test and large scale testing to really know anything about true mortality rates. I'm sure when an appropriate study comes out @Mick or whoever will start a separate thread.

This thread has data from all over the place (south korea, vs China, vs Germany vs Italy etc etc). There's no easy way to summarize the current thread content. You just have to skim through it.

Or pick a specific topic, like "can y'all summarize this Santa Clara study in simple English and 2 paragraphs?"
 
Last edited:
Oh if this is relevant i think may help as it seems the Australian flu season appears to have been headed off by the hand-washing, quarantine and social distancing measures designed to control COVID-19.
We've seen a similar effect in Germany , except since we were mostly past the flu season already, it just stopped sooner. That's the fun thing about hygiene: it has good health effects across the board, you have nothing to lose if you advocate it.

Hi all could i ask for a summary of this thread please, its a few pages now and data heavy and im not that wise or gifted to grasp what has been shown..
Well, what do you want to know?

Overview

* It is hard to project how many people are going to die, because
-- we do not know how many people will be infected in the end
-- we do not know how many of the infected will die
-- we do not know how the case numbers relate to the dead and infected

We have estimates for the stuff we do not know, and hot discussion about whose estimates are more correct. Studies are emerging right now that give us data so we can know.

CFR = Case Fatality Rate = deaths/cases
This is the percentage of cases that are dying. This depends highly on what counts as a case, i.e. how well a specific country is detecting cases. Countries in the containment phase that manage contact tracing well and have appropriate testing resources should detect close to every infected person with symptoms. (I assume that's happening when 90% or more of a region's PCR tests come back negative.) Countries in the mitigation phase and where testing capacity is stretched will detect far less. Since you usually detect most of the deaths (but even there, some countries didn't initially know how many people did outside of hospitals!), the CFR is higher if you find fewer cases. Cases are typically confirmed with a genetic PCR test which is very specific. (The WHO test will only detect bat coronaviruses, and there's only one of these going around right now.)

Age is the major risk factor.
The case fatality rates from the big Chinese outbreak have stood up well, in my opinion. They're under 0.5% for ages up to 50, 1.3% (50-60), 3.6% (60-70), 8.0% (70-80), 14.8% (over 80). The 14.8% matches for the German data I looked at a while back, the others were somewhat lower (~4% in 60-80), but that may have changed as more data has come in, I should re-process that (see below).

Most of the people who die of Covid-19 are old or very old. So if you look at the mortality of a population (how many people will die?), you need to adjust this for how many old people you have. Any study that doesn't do that can't tell you anything robust about mortality.

People die a lot later than when they become cases, so to find the mortality of the cases you have now, you theoretically have to wait until all of them are recovered or dead. This is a big problem if you want to be first to publish, and basically makes some Diamond Princess studies very questionable, because the study is based on 7 deaths and now we have 13 or 14.

IFR = Infection Fatality Rate = deaths/infected = lethality
This is the percentage of people infected with the virus that are dying. To determine this number, we need to find out who has been infected. We know the number of cases, but we don't know how many infected people never became cases because they never got tested. (The lethality is obviously also heavily age-specific!)

Happily, when your body fights an infection, it creates antibodies (such as IgG and IgM), and these persist for months after the infection. We can simply survey a population with an antibody test and find out who already got infected.

The problem is that antibody tests are hard to develop. They have a poorer record of distinguishing Covid-19-antibodies from other coronavirus antibodies that we might have because we caught a cold. As Z.W.Wolf just explained, this becomes a problem if you are trying to find a small number in a big sample.

Test Error
A test can have 4 outcomes:
- person was infected, test is positive: true positive
- person was not infected, test is positive: false positive (error!)
- person was not infected, test is negative: true negative
- person was infected, test is negative: false negative (error!)
Specificity = "true negatives"/"not infected"
Sensitivity = "true positives"/"infected"
100%-Specificity = false positive rate (not infected counted as infected)
100%-Selectivity = false negative rate (infected people you miss)

Obviously, we need to know these values to analyse test results for error, but there's also error inherent in the way we discover them. If we test 30 samples known to not be infected (because they're from last year), and all results are negative, do we have 100% specificity? Not quite, because if the true specificity is 90.5% = 9.5% false error rate, there's still a random chance of 5% to get that result. So all we can say is that we're 95% confident that the true specificity is 90.5%-100%. That's the 95% confidence interval, and that's what you usually see as error bars on good graphs.

But it gets worse: if our lab used samples from healthy people (such as blood donors) taken during the summer (when fewer people have colds) to determine specificity, that wouldn't apply if we test everybody in April, right after the flu season: we'd expect this sample to have more false positives that the sample used to evaluate the test. (If you don't know how the manufacturer selected their samples, they may have selected them to make their test look good.)

So, to do this immunological survey that we need to find out who has been infected, we need our antibody test to be really good, or we need to at least confirm the positives with a really good test to be sure we don't count false positives.

Herd immunity
If we want to predict how our hospitals are coping, we don't really need the IFR: we can look at the hospitalization rate, or predict that from the case rate, and base our short-term public health measures off that. But if we have the IFR, we can take our number of deaths, divide by the IFR, and get an estimate of how many people are infected right now. And then we hope that everyone who has been infected is immune to the virus. With "common cold" coronaviruses, most people retain immunity for a while (maybe 2 years?) after they have had an infection.
And that hinders the spread of the virus. If the virus spreads to 3 people on average (R0=3), then we need to reach the point where 2/3 are immune: the virus wants to spread to 3 people, but only 1 person gets sick. That means the epidemic can no longer grow, and as R0 drops below 1, it fizzles out. (This is why anti-vaxxing "works" if not too many people do it.)

The hope is that if we have a lot more infected people than we thought, we might not have to wait for a vaccine to make that happen. Unfortunately, it doesn't much look like that's going to the case, but we'll know for sure when some robust studies have been published. This is underway, but takes time to do properly.

If we assume we have no vaccine and that 70% of the old people get infected, we take 70% of the lethality (IFR) and we have a good idea of how many people are going to die. This is another reason why the "it's harmless" believers like to see a low IFR: if the overall mortality isn't that high, it's somehow ok if grandma dies. (If you're looking forward to retirement, you ought to think twice about creating a society that kills old people off without remorse.)

Mortality = deaths / population
Here is where the "it's harmless" people come in. "People die every day, if we don't have a lot of people dying from Covid-19, why bother?" If you live in a region where the deaths haven't gotten out of hand, you can't disprove them. But the data from other regions and countries makes it clear that Covid-19 doesn't just cause people to die "who would have died anyway", it's significantly more, and the EuroMOMO data I posted in #183 proves that.

The way this works is via excess mortality, which is a bit of a shortcut: you take a baseline mortality computed from past years that had no outbreaks (mild flu season), and you compare the actual mortality to that baseline. The deaths that are significantly above the baseline are the excess mortality, and they are considered to be caused by the current outbreak (typically influenza). The excess mortality from Covid-19 is clearly visible in many countries.
image.jpeg

Prevalence, incidence rates, and case numbers
Prevalence and incidence rate are ratios, typically with respect to 100 000 people.
The news reports absolute numbers of cases and deaths. The trouble with that is that these numbers are roughly in line with population figures. They don't help you answer the question if a small town has fewer cases because it is smaller than e.g. New York, or if it has fewer cases because the virus spreads less rapidly. If you make a "corona map" of the US, it just looks a lot like a population map if you use absolute numbers. If we want to compare populations of different sizes, we need to use rates.

Contact Tracing
Health officials try to ask every confirmed infected person whom they might have infected. High-risk contacts are typically those who you have been closer than 6 feet to for more than 15 minutes. These contacts are then isolated, and tested if they have symptoms. Contact tracing and isolation breaks chains of infection and contains the spread. A Covid-19 contact tracing team typically consists of five people who are on the phone a lot. This seems awfully low-tech, but's proven to work. The South Korean super spreader cluster was contained with contact tracing. Germany's first outbreak was contained with contact tracing, which bought us much-needed time.

Contact tracing requires adequate manpower and timely access to test results. It contributes to mitigation even if it's overburdened, but it's our best hope to contain the spread in a situation where we relax social distancing measures. We've talked elsewhere about the many outbreaks of infectious diseases occurring all over the globe each year, and most don't turn into epidemics because of contact tracing.

Covid-19 is difficult to trace because there's evidence that people are most infectious on the day before symptoms start. If it takes you a day to decide to see a doctor, and then there's another day delay until you receive the test result, the first people you have infected could already be infectious themselves. This is why isolation of contacts is absolutely crucial, even if they have no symptoms.



That's the factual overview. It feels like I forgot to explain something important, so please ask if you think something's missing.

Personal Outlook
We are conditioned by Hollywood (and maybe human nature) to perceive winning a fight as something highly dramatic that we overcome. That's why war movies make good viewing, and why New York is on the TV so much (and why there's always more strife on reality TV than we see in the real world). But it's healthier to not have drama. It's better to arrest a perp before he gets into the car instead of having a high-speed chase. It's better to do long hours of observation on cold street corners than to have a shootout. It's better to head an epidemic off than to fight its effects.

This requires forethought and planning. It requires trusting in the people who do the planning, especially if it is successful, because then you never get to see the drama that has been headed off. This is difficult because the planning has to be done under uncertainty. We can't wait until we have researched all the answers, because by then our inactivity has created the drama we are planning to avoid. We need to trust the people who are best placed to take a good estimate of where we're headed, and listen to them, and accept that they will turn out to have been more or less accurate in their predictions. (The conspiracy theorists suggest that we should trust those people least who have been most accurate in their planning. The problem with that approach is obvious.)

So especially when our politicians and scientists have been successful, we need to cope with the fact that we have invested a lot of effort with no drama to show for it. This is particularly difficult in Western, more individualistic societies. If we don't see the drama, we tend to want to believe that there is no danger. (If we trust the drama that the media are showing us elsewhere, that's a substitute, but those who don't trust the media have a problem again.)

Statistics become a tool in this: they tell us how much danger we are in. Those concerned with planning want the numbers to be accurate and to show the exact danger we're in, and those who want the danger to not be there want the numbers to confirm their world-view. This is what the fight is about. It's about seeing 15 cases as a small number that will quickly go away, or as a sign that an epidemic is on the horizon. By now, it's obvious which approach is trustworthy.
 
Last edited:
It feels like I forgot to explain something important
you forgot to tell us how Covid fatality rates relate to flu fatality rates. (granted i havent read the last few pages of this thread, but isnt that the general topic?)
 
Back
Top