The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 TeasersAudacious Epigone Blog
Coronavirus Correlates by State
🔊 Listen RSS
Email This Page to Someone

 Remember My Information



=>

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
AgreeDisagreeThanksLOLTroll
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
Search Text Case Sensitive  Exact Words  Include Comments
List of Bookmarks

Correlations with coronavirus death rates as of April 11, 2020 at the state level are weak across the board:

Obesity rate — (.13)
Median age — .09
Clinton’s 2016 vote share — (.04)
Population density — .13
White population % — (.18)
Black population % — .20
Asian population % — .13
Hispanic population % — .11

Anything else come to mind as a potentially meaningful correlate, or is looking at the state level hopelessly imprecise?

Five states–Arkansas, Iowa, Nebraska, North Dakota, and South Dakota–have still yet to issue shelter-in-place orders. Their coronavirus death rate currently stands at 9 per million. Nationwide, the coronavirus death rate is 62 per million. These states have not been hit nearly as hard as the rest of the country has, at least not yet.

Though we came in for some derision in predicting a couple of weeks ago that the death toll would come in under 100,000 nationwide, it now looks like that’s the way to bet–at least in the pestilence’s first wave. This is great news, a welcome relief for everyone.

The big question still hanging over us has concerns the true infection rate, not just the symptomatic rate. The issue in getting this answered putatively involves scaling up to capacity to administer tens or hundreds of millions of tests.

Okay, but in the meantime can an organization like Gallup team up with a couple of test producers and administer tests to a nationally representative sample of a couple thousand people? Instead of polling people on their political opinions, we poll their coronavirus antibodies or lack thereof.

If we ultimately find out that the penetration rate is much higher than initially suspected–and thus the mortality rate far lower–we can end shelter-in-place, social distancing, and other economically catastrophic tactics for most of the population. The onus for these restrictive measures can instead be placed on high-risk people and until testing capacity becomes ubiquitous, limited testing resources reserved for them.

If that turns out to be the case, we’re really going to be slapping ourselves for not having realized as much last month, since it could’ve been done not just with a small fraction of our testing capacity today, but even of our much more limited capacity from several weeks ago.

 
• Category: Race/Ethnicity, Science • Tags: Coronavirus, The states 
Hide 51 CommentsLeave a Comment
Commenters to Ignore...to FollowEndorsed Only
Trim Comments?
  1. As with any illness, general health will be the biggest factor in prognosis.

    Obesity, drug abuse and underlying health conditions are the factors here.

    Alcohol seems beneficial for Covid-19, in moderation of course.

    Minorities are getting hit hard due to poor diet, obesity rates and drug problems, I’d guess.

    • Replies: @Justvisiting
    I would like to see similar correlations done with the annual flu.

    Then we would know if we were just looking at normal disease issues or something specific related to this virus.
  2. Anything else come to mind as a potentially meaningful correlate

    Years of education?

  3. “Anything else come to mind as a potentially meaningful correlate, or is looking at the state level hopelessly imprecise?”

    What about how far north or south the state is?

    • Replies: @Oo-ee-oo-ah-ah-ting-tang-walla-walla-bing-bang
    Agreed, latitude was one of the initial Big Questions of the spread, as well as ambient temp and humidity
    , @res
    How about average dew point. Here is data for the states.
    https://www.forbes.com/sites/brianbrettschneider/2018/08/23/oh-the-humidity-why-is-alaska-the-most-humid-state/

    Would probably be better to look at average dew points in February, March, and April, but that data is harder to gather.

    Or temperature. Here are average temperatures and average winter temperatures.
    https://www.currentresults.com/Weather/US/average-annual-state-temperatures.php
    https://www.currentresults.com/Weather/US/average-state-temperatures-in-winter.php

    Or precipitation.
    https://www.currentresults.com/Weather/US/average-annual-state-precipitation.php
    Again, seasonal would probably be better.

    On another note, how about personality traits? This paper has a table of the big 5 by state on the last page. These might be interesting to use for other purposes as well.
    https://www.apa.org/news/press/releases/2013/10/regions-personalities
    https://www.apa.org/pubs/journals/releases/psp-a0034434.pdf

    Shelter in place order as a binary variable and date of initial/strongest order seem like possibly useful variables.

    P.S. Have you (AE) tried any multivariate regressions? It would be interesting to see something like race (4 category variable), obesity, density, dew point, and extroversion. Then some analysis to see how they work together (ANOVA?).

    P.P.S. Agreed that states are rather imprecise. Seems like major cities would be better. Especially since right now they seem to be driving the spread.

  4. Population density should be a factor. All else equal Is expect more penetration in population of a million concentrated in a city than same population spread out over vast area.

    • Replies: @Kratoklastes
    One of the really hard things to get right is driven by the absolutely obvious age-cohort partition in hospitalisation, severity, and mortality data.

    It seems that infection rate is (very) roughly constant across age groups (ex the very young) - although it must be borne in mind that to be counted in I you have to have been tested, which is a very biased sample in the US.

    It's like having another two compartments in a SEIR model:

    S → E → I → TH → R/D
     
    where there are additional transition probabilities of going from 'Infected' to 'Infected, with symptoms severe enough to generate a Test', and some subset of those require Hospitalisation, of which some Die.

    That doesn't mean that Pr(I|E) is constant across age groups, though: relatively-more-robust 20-29 year olds probably have a lower chance of becoming infected for a given level of exposure.

    The excess infection rate in 20-29 year olds (150% of their population share, in Australian data) probably indicates that these people have the most social interaction - and most of their social interaction will be within-group.

    Conversely, the average ill person over 70 - by far the most-over-represented in fatalities-with covid19 - has a much narrower (and less varied) group of social contacts: in the normal course of events it's family or nursing home staff - if one person on staff is asymp, they can infect tens of the inmates of a nursing home.

    Modelling different age-group transition probabilities is proving a challenge, because the age profile of the contacts of a each age group will be distributed differently - and the numbers in I, T, H, and D in the 70+ age groups can result from an exposure from the contacts of a 20-29yo (and vice versa).
  5. @DanHessinMD
    "Anything else come to mind as a potentially meaningful correlate, or is looking at the state level hopelessly imprecise?"

    What about how far north or south the state is?

    Agreed, latitude was one of the initial Big Questions of the spread, as well as ambient temp and humidity

  6. So, it is a bit of a stretch to say “Latinx hardest hit.” Why am I not surprised?

  7. Population density for the whole state won’t be very meaningful for a state like Washington. It’s a big place, and the eastern, what, two-thirds? are pretty empty. But Seattle is a big, packed city.

    I’d look at population density of the biggest city in a state.

    • Replies: @res

    I’d look at population density of the biggest city in a state.
     
    Good idea.
  8. The onus for these restrictive measures can instead be placed on high-risk people and until testing capacity becomes ubiquitous, limited testing resources reserved for them.

    Out of curiosity, what use would testing them serve?

    Let’s be frank: Restricting testing to those who present symptoms was sleight of hand that served little more than adding up a couple categories of politically useful but otherwise useless information for the purposes of epidemiology. The first move of a serious researcher should precisely be to understand how virulent this thing is in the general population, and only then start suggesting more extreme mitigation measures. Mitigation measures like washing hands, minimising handshakes, sneezing into sleeves have been postered in doctors’ offices, company bulletin boards, stores, etc. for years.

    In fact, the aggregate numbers that are reported are still relatively trivial; this has been little more than a very bad flu season so far.

    But this has been whipped up into a frenzy used to line certain parties’ pockets and, seemingly, judging by MSM questions and octobox round table blather, to take down the Bad Orangeman and advance a statist agenda that couldn’t be advanced with more ethereal causes like climate change, which was only get around to killing us in a dozen years rather than in the next 18 days.

    • Agree: Brian Reilly
  9. @Dr. Doom
    As with any illness, general health will be the biggest factor in prognosis.

    Obesity, drug abuse and underlying health conditions are the factors here.

    Alcohol seems beneficial for Covid-19, in moderation of course.

    Minorities are getting hit hard due to poor diet, obesity rates and drug problems, I'd guess.

    I would like to see similar correlations done with the annual flu.

    Then we would know if we were just looking at normal disease issues or something specific related to this virus.

    • Replies: @res
    The comment from utu which I replied to here:
    https://www.unz.com/isteve/new-york-vs-california/#comment-3831677
    has a link to CDC data. I linked my comment instead because I talk about age adjusted vs. raw death rates. Which matters. I think the choice of which to use depends on what you are trying to accomplish.
  10. The five states that have yet to issue shelter-in-place orders may want to reconsider. A new CDC report says the virus can travel 13 feet:

    https://nypost.com/2020/04/12/the-coronavirus-can-travel-at-least-13-feet-new-study-shows/

    The other states shut pretty much everything down but everyone has been standing six feet away from each other in long grocery store lines for the last month so they can get food and not starve to death. This type of shutdown may not be as effective as previously thought so we may have destroyed the economy to achieve less results in stopping the spread of the disease than we thought.

    • Replies: @Brian Reilly
    Mark, 13 feet? Not 27? not 14 or 12 feet? Is 6 feet safe? Inside or out? With or without the wind? Does a plume of droplets hang in the air (still air? Completely still??) like smoke from a campfire?

    I call BS on all of it. It is a variety of the flu. Kills some people. BFD. Might kill me, I suppose. Everybody dies of something. I never expected the idiots running this nation to shut it down because they gave a s%it if I died, and I don't think so now. This is a scam being used to depress the people and enable a takeover that would not otherwise be as easy for the psychos working it.
  11. @DanHessinMD
    "Anything else come to mind as a potentially meaningful correlate, or is looking at the state level hopelessly imprecise?"

    What about how far north or south the state is?

    How about average dew point. Here is data for the states.
    https://www.forbes.com/sites/brianbrettschneider/2018/08/23/oh-the-humidity-why-is-alaska-the-most-humid-state/

    Would probably be better to look at average dew points in February, March, and April, but that data is harder to gather.

    Or temperature. Here are average temperatures and average winter temperatures.
    https://www.currentresults.com/Weather/US/average-annual-state-temperatures.php
    https://www.currentresults.com/Weather/US/average-state-temperatures-in-winter.php

    Or precipitation.
    https://www.currentresults.com/Weather/US/average-annual-state-precipitation.php
    Again, seasonal would probably be better.

    On another note, how about personality traits? This paper has a table of the big 5 by state on the last page. These might be interesting to use for other purposes as well.
    https://www.apa.org/news/press/releases/2013/10/regions-personalities
    https://www.apa.org/pubs/journals/releases/psp-a0034434.pdf

    Shelter in place order as a binary variable and date of initial/strongest order seem like possibly useful variables.

    P.S. Have you (AE) tried any multivariate regressions? It would be interesting to see something like race (4 category variable), obesity, density, dew point, and extroversion. Then some analysis to see how they work together (ANOVA?).

    P.P.S. Agreed that states are rather imprecise. Seems like major cities would be better. Especially since right now they seem to be driving the spread.

    • Thanks: Audacious Epigone
    • Replies: @res
    Since dew point is nonlinear with respect to specific humidity (and R0 is nonlinear with respect to specific humidity in a way which I think amplifies the issue) it might make sense to try some transforms of dew point. Some relevant graphics in this comment from last month.
    https://www.unz.com/anepigone/covid-19s-hard-fail/#comment-3764901
    , @Audacious Epigone
    I've not. The toolpak I use for multivariate analysis isn't currently working. I'll get another one if it's not fixed by the time the dust has settled but I'm going to wait until then to post anything else. It feels like it could be more obfuscatory than clarifying at this point with the numbers bouncing around like they are.
  12. @Daniel Williams
    Population density for the whole state won’t be very meaningful for a state like Washington. It’s a big place, and the eastern, what, two-thirds? are pretty empty. But Seattle is a big, packed city.

    I’d look at population density of the biggest city in a state.

    I’d look at population density of the biggest city in a state.

    Good idea.

  13. @Justvisiting
    I would like to see similar correlations done with the annual flu.

    Then we would know if we were just looking at normal disease issues or something specific related to this virus.

    The comment from utu which I replied to here:
    https://www.unz.com/isteve/new-york-vs-california/#comment-3831677
    has a link to CDC data. I linked my comment instead because I talk about age adjusted vs. raw death rates. Which matters. I think the choice of which to use depends on what you are trying to accomplish.

  14. Audacious Epigone, I think looking at latitude will be really interesting because it shows to what extent this virus is seasonal.

    I made the table below, which gives the latitude for the largest city in every state. Then to better show the range, I subtracted 29 from the latitude. For a better look at correlations, you may want to remove Alaska and Hawaii, which are far outside of the range of the others.

    These correlations are sharper for countries, where the latitude variation is greater.

    State / Largest City Latitude of Largest City Latitude -29

    ALABAMA: Birmingham 33.5186 4.5186
    ALASKA: Anchorage 61.2181 32.2181
    ARIZONA: Phoenix 33.4484 4.4484
    ARKANSAS: Little Rock 34.7465 5.7465
    CALIFORNIA: Los Angeles 34.0522 5.0522
    COLORADO: Denver 39.7392 10.7392
    CONNECTICUT: Bridgeport 41.1792 12.1792
    DELAWARE: Wilmington 39.7447 10.7447
    FLORIDA: Jacksonville 30.3322 1.3322
    GEORGIA: Atlanta 33.749 4.749
    HAWAII: Honolulu 21.3069 -7.6931
    IDAHO: Boise 43.615 14.615
    ILLINOIS: Chicago 41.8781 12.8781
    INDIANA: Indianapolis 39.7684 10.7684
    IOWA: Des Moines 41.5868 12.5868
    KANSAS: Wichita 37.6872 8.6872
    KENTUCKY: Louisville 38.2527 9.2527
    LOUISIANA: New Orleans 29.9511 0.9511
    MAINE: Portland 45.5051 16.5051
    MARYLAND: Baltimore 39.2904 10.2904
    MASSACHUSETTS: Boston 42.3601 13.3601
    MICHIGAN: Detroit 42.3314 13.3314
    MINNESOTA: Minneapolis 44.9778 15.9778
    MISSISSIPPI: Jackson 32.3547 3.3547
    MISSOURI: Kansas City 39.0997 10.0997
    MONTANA: Billings 45.7833 16.7833
    NEBRASKA: Omaha 41.2565 12.2565
    NEVADA: Las Vegas 36.1699 7.1699
    NEW HAMPSHIRE: Manchester 42.9956 13.9956
    NEW JERSEY: Newark 40.7357 11.7357
    NEW MEXICO: Albuquerque 35.0844 6.0844
    NEW YORK: New York City 40.7128 11.7128
    NORTH CAROLINA: Charlotte 35.2271 6.2271
    NORTH DAKOTA: Fargo 46.8772 17.8772
    OHIO: Columbus 39.9612 10.9612
    OKLAHOMA: Oklahoma City 35.4676 6.4676
    OREGON: Portland 45.5051 16.5051
    PENNSYLVANIA: Philadelphia 39.9526 10.9526
    RHODE ISLAND: Providence 41.824 12.824
    SOUTH CAROLINA: Charleston 32.7765 3.7765
    SOUTH DAKOTA: Sioux Falls 43.5473 14.5473
    TENNESSEE: Nashville 36.1627 7.1627
    TEXAS: Houston 29.7604 0.7604
    UTAH: Salt Lake City 40.7608 11.7608
    VERMONT: Burlington 44.4759 15.4759
    VIRGINIA: Virginia Beach 36.8529 7.8529
    WASHINGTON: Seattle 47.6062 18.6062
    WEST VIRGINIA: Charleston 38.3498 9.3498
    WISCONSIN: Milwaukee 43.0389 14.0389
    WYOMING: Cheyenne 41.14 12.14
    Washington D.C. 38.9072 9.9072

    AE, if it would make it easier for you in Excel, email me at [email protected] and I will reply with a spreadsheet. I will also give you the states according to winter temperature and humidity next; great suggestion by res!

    I think the country correlations on climate and the Coronavirus will be even more significant, since countries give much greater climate variation.

  15. Here are states by average winter temperature.

    The key is 0-10 = 1;10-15=2; 15-20=3; 20-25=4; 25-30=5; 30-35=6; 35-40=7; 40-45=8; 45-50=9; 50-55=10; 55-60=11; 60-65=12

    Based on
    https://www.currentresults.com/Weather/US/average-state-temperatures-in-winter.php

    State / Largest City Winter Temperature (1-12)
    ALABAMA 9
    ALASKA 1
    ARIZONA 8
    ARKANSAS 8
    CALIFORNIA 9
    COLORADO 5
    CONNECTICUT 5
    DELAWARE 7
    FLORIDA 11
    GEORGIA 9
    HAWAII 12
    IDAHO 5
    ILLINOIS 5
    INDIANA 5
    IOWA 4
    KANSAS 6
    KENTUCKY 7
    LOUISIANA 10
    MAINE 3
    MARYLAND 6
    MASSACHUSETTS 5
    MICHIGAN 4
    MINNESOTA 2
    MISSISSIPPI 9
    MISSOURI 6
    MONTANA 4
    NEBRASKA 5
    NEVADA 6
    NEW HAMPSHIRE 4
    NEW JERSEY 6
    NEW MEXICO 7
    NEW YORK 4
    NORTH CAROLINA 8
    NORTH DAKOTA 2
    OHIO 5
    OKLAHOMA 7
    OREGON 6
    PENNSYLVANIA 5
    RHODE ISLAND 5
    SOUTH CAROLINA 9
    SOUTH DAKOTA 3
    TENNESSEE 7
    TEXAS 9
    UTAH 5
    VERMONT 3
    VIRGINIA 7
    WASHINGTON 6
    WEST VIRGINIA 6
    WISCONSIN 3
    WYOMING 4
    Washington D.C. 5

  16. @res
    How about average dew point. Here is data for the states.
    https://www.forbes.com/sites/brianbrettschneider/2018/08/23/oh-the-humidity-why-is-alaska-the-most-humid-state/

    Would probably be better to look at average dew points in February, March, and April, but that data is harder to gather.

    Or temperature. Here are average temperatures and average winter temperatures.
    https://www.currentresults.com/Weather/US/average-annual-state-temperatures.php
    https://www.currentresults.com/Weather/US/average-state-temperatures-in-winter.php

    Or precipitation.
    https://www.currentresults.com/Weather/US/average-annual-state-precipitation.php
    Again, seasonal would probably be better.

    On another note, how about personality traits? This paper has a table of the big 5 by state on the last page. These might be interesting to use for other purposes as well.
    https://www.apa.org/news/press/releases/2013/10/regions-personalities
    https://www.apa.org/pubs/journals/releases/psp-a0034434.pdf

    Shelter in place order as a binary variable and date of initial/strongest order seem like possibly useful variables.

    P.S. Have you (AE) tried any multivariate regressions? It would be interesting to see something like race (4 category variable), obesity, density, dew point, and extroversion. Then some analysis to see how they work together (ANOVA?).

    P.P.S. Agreed that states are rather imprecise. Seems like major cities would be better. Especially since right now they seem to be driving the spread.

    Since dew point is nonlinear with respect to specific humidity (and R0 is nonlinear with respect to specific humidity in a way which I think amplifies the issue) it might make sense to try some transforms of dew point. Some relevant graphics in this comment from last month.
    https://www.unz.com/anepigone/covid-19s-hard-fail/#comment-3764901

  17. r^2= 80% for Alcohol Consumed Per Drinker as of 1994

    These are COVID-19 deaths correlated with about 500 ecological variables, sorted by their coefficients of determination (r^2). The spreadsheet is at this link for those interested. Please note that if I had more time, I’d have delete the rows that are absolute counts (ie: not per capita). For those interested in what “VerbalChange1991to2001” means: It’s the change in the SAT verbal scores.

    • Thanks: res, Audacious Epigone
    • Replies: @res
    Thanks! The alcohol variable at the top is much less impressive given that it is only for 11 states though.

    But there are quite a few variables with r^2 of around 50% or above. I am stunned that
    Jew/Italian/Dominican/Russian/Puerto Rican/Jamaican all correlate so well across all 50 states. Abortion is another interesting variable, but only has data for 44 states.

    What I don't understand is how those did not show up more strongly in AE's data. AE saw correlations (r!) of less than 0.2! I would have expected Clinton and population density to be decent proxies for those demographic variables above.

    AE, could you see how this fits in with your results? Is he using similar death data? Could you try consolidating your data set with his or post your data here?

    P.S. DanHessinMD, are you planning on rolling your data into that spreadsheet? Or should I do it if you and AE don't?
    , @James Bowery
    I've cleaned up the by-state correlations to get rid of the misleading alcohol data (N=11 and far from normally distributed) as well as the absolute count variables that were just basically adding confusion.
    , @Kratoklastes
    That sort of data search is the key reason why reduced-form modelling rightly gets attacked. It's "CompSci" statistics - easy to implement, but completely ignorant of the underlying statistical properties of what comes out.

    If you do 500 preliminary pairwise ρᵢⱼ , a bunch of them will be significant by chance - and since the dependent variable is truncated, the proportion of significant ρᵢⱼ will be greater than αN (where α is the significance level and N is the sample size).

    ρᵢⱼ is the 'go-to' mechanism for Pharma- and Psych-'quant' [sic] - which is why their 'studies' don't replicate.

    Back when I taught Applied Econometric Modelling (a 3rd year undergrad subject), 20% of the first assignment was dedicated to checking if the kiddies properly understood things like spurious correlation, and to showing them why data-search was the same thing as what is now known as p-hacking (we called it 'data dredging' at the time: neither term is sufficiently pejorative - it should be called what it is: "corrupt quant").

    Maybe 10% of the class really 'got' the concept - and these were kids that had to be in the 95th percentile of HS kiddies (to get into first year), and the course was an elective and required a 'Credit' in 3 precursor subjects: Statistics (1st yr); Econometric Theory (2nd yr); Applied Econometrics (2nd yr). (NB... 'Credit': 3rd grade in the range, but still an American 'A').

    <rant>

    American 'A' starts just above the median (the image below is as of 2016): at 4-year colleges it's 42% of all grades. So 'Straight A student' has been rendered effectively meaningless - it barely means 'better than average'.

    https://www.timeshighereducation.com/sites/default/files/grades2.png

    By contrast, at my alma mater, students with a Credit average over the first 3 years were eligible for the Honours year - but if they only scraped in and then attempted Honours, they would finish in the bottom quintile. At undergrad only the top decile got a Distinction in a subject, and only the top 1-5 students in a class got a High Distinction: an "HD average" was almost unheard-of.

    Pity the Monash BEc kids who had a 'D' average: be in the top couple of percent of the student body at a university ranked in the top few dozen in the world for Economics (#24 in 2013; #39 now)... get a 'D' average. Must be the most misunderstood cohort in the history of tertiary education.

    (I think Monash has moved to GPA now, to cater for the stupidity of HR-tards. Imagine some fuckwit HR-tard seeing a bunch of 'D' grades on a transcript).

    </rant>

  18. How about public transportation usage? A quick glance at the chart here looks encouraging:
    https://en.m.wikipedia.org/wiki/List_of_U.S._cities_with_high_transit_ridership

    • Thanks: Audacious Epigone
    • Replies: @Audacious Epigone
    Indeed there does look like there could be a lot of potential here. There are about 300 cities in the US with populations of 100k or more. This is bookmarked for the future, thanks.
  19. There aren’t going to be any correlates with gross population data. None.

    The reason why is because viruses do not spread around at random the way everyone has been assuming. Rather, every step in the following process follows a Pareto distribution.

    1. Exposure: Were you exposed at all?

    2. If you were exposed, did you contract the virus?

    3. If you contracted, did you develop antibodies?

    4. If you developed antibodies, did you show any symptoms?

    5. If you showed symptoms, were they serious enough to attract medical attention?

    6. Did you die?

    By the time you get down to death at number 6, you’re looking at people who were pretty much running towards a date with destiny at full tilt, not at people who are represented by a cross section of other variables. It’s like asking who is going to win the track meet by measuring the vital signs of the people in the bleachers.

    • Replies: @Kratoklastes

    It’s like asking who is going to win the track meet by measuring the vital signs of the people in the bleachers.
     
    Beautifully put.

    And the flip-side - the OK Doomer[1] model - is assuming that the people in the bleachers can run the same average time as the guys on the track.

    [1] Manfred Arcane used"OK Doomer" in a comment the other day; it's much better than "Chicken Little".
    , @James Bowery
    Seriously and egregiously wrong both technically and in spirit.

    In spirit:

    When the epidemiology "authorities" are doing F.A. with their enormous resources, and people are running around in hysterics, it is up to people with limited resources (time, money, education and access to data) to do the best they can under those circumstances. That means exploratory data analysis that _should_ have been done by those with the resources is the only way one might plausibly shame the authorities into doing their job. Criticize the specifics about the efforts, yes. That's in the spirit of the situation.

    Technically:

    What you call "gross population data" are "ecological data". While the persnickety will attempt to score points with critiques of "the ecological fallacy" (which is at least a cut above the Stat 101 "Correlation doesn't imply causation!") ecological correlations are fundamental to epidemiology going back to its origins. Chollera, for example, could not have been detected as water borne without the ecological correlations of wells and outbreaks. It may ruffle the feathers of the pedants to see people "wasting pristine data" by simply looking at it without testing any hypothesis, but when it comes to public health, we're dealing in ecological relationships, so T.S. Sometimes these relationships are obvious, like population density and one might try to get away scoring pedantic points by saying it is uninformative to find such correlations as justification of the general critique of exploratory data analysis. But this is mere confirmation bias that ignores the possibility -- quite distinct -- that a supposedly "spurious" correlation may turn out to contain within it a clue to a previously unnoticed latent variable that just might save lives.
  20. Audacious Epigone —

    Did some data crunching over here.

    Admittedly the statewide correlations with death rate and latitude are weak.

    But the national correlations between latitude of the capital and death rate per million is ***incredibly strong***.

    I made a plot of latitudes of country capitals on the x-axis and coronavirus deaths per million on the y-axis. The correlation is incredibly strong.

    I limited myself to nations with significant testing — about 140 countries.

    There are zero nations whose capital is south of 30 north latitude with more than 20 deaths per million people. There are 25 nations whose capital is north of 30 north latitude with more than 20 deaths per million. The United States right now is at 65 deaths per million. South of the equator, zero nations exceed 7 deaths per million.

    Ten countries exceed 100 deaths from Coronavirus per million and every single one of them has a capital north of 40 latitude.

    DC is at 39 latitude and New York is at 41 latitude. There is an incredibly strong climate correlation with the coronavirus, but the United States just doesn’t give enough variation to properly show this. The whole US is basically between 30 and 45, so the correlations of climate are not nearly as strong as what one sees globally.

    • Agree: Kratoklastes
  21. Let it run its course. It is not going away. Develop herd immunity. I tried to tell you you were simply jumping on the panic, “please help me government!” Band wagon. How does that government subjugation feel now?

  22. @James Bowery
    r^2= 80% for Alcohol Consumed Per Drinker as of 1994

    These are COVID-19 deaths correlated with about 500 ecological variables, sorted by their coefficients of determination (r^2). The spreadsheet is at this link for those interested. Please note that if I had more time, I'd have delete the rows that are absolute counts (ie: not per capita). For those interested in what "VerbalChange1991to2001" means: It's the change in the SAT verbal scores.

    Thanks! The alcohol variable at the top is much less impressive given that it is only for 11 states though.

    But there are quite a few variables with r^2 of around 50% or above. I am stunned that
    Jew/Italian/Dominican/Russian/Puerto Rican/Jamaican all correlate so well across all 50 states. Abortion is another interesting variable, but only has data for 44 states.

    What I don’t understand is how those did not show up more strongly in AE’s data. AE saw correlations (r!) of less than 0.2! I would have expected Clinton and population density to be decent proxies for those demographic variables above.

    AE, could you see how this fits in with your results? Is he using similar death data? Could you try consolidating your data set with his or post your data here?

    P.S. DanHessinMD, are you planning on rolling your data into that spreadsheet? Or should I do it if you and AE don’t?

  23. Though we came in for some derision in predicting a couple of weeks ago that the death toll would come in under 100,000 nationwide, it now looks like that’s the way to bet–at least in the pestilence’s first wave.

    Here’s a pretty good take on the subject. TL;DR: the virus will continue spreading irrespective of what the models say until several hundred thousand or even a few million Americans die. The author examines the potential economic impact and concludes that it’s better to sacrifice some portion of the economy in the short term rather than letting people die en masse, which would be a permanent detriment to economic growth. However, I don’t totally agree. This analysis does not take into account Western immigration policies.

    Coronavirus: Can it even be stopped?

    Here’s what the U.S. ruling class is likely to do next (my take):

    They’ll subtly ramp down testing and reporting through a combination of withdrawing federal funds for testing centers and bullying the press through willful misinterpretation of scientific data.

    They are already doing the former.

    Federal Support Ends For Coronavirus Testing Sites As Pandemic Peak Nears

    https://www.npr.org/sections/coronavirus-live-updates/2020/04/08/829955099/federal-support-for-coronavirus-testing-sites-end-as-peak-nears

    They’ll then use the current model, which accounts for social distancing / lockdowns, to declare victory over the virus; reaching the peak of this model will be equated to defeating it, justifying a return to business as usual. They’ll also say the model predicts just slightly more than annual reported flu deaths and berate previous critics as “hysterical” or maybe even as conspiracy theorists in order to discourage any future reporting on the subject when the virus inevitably comes back in a second wave (which will mostly go unreported, at least until after the election).

    They’ll allow the virus to burn through the population, which has no immunity. The end result will be far more deaths than what this model is currently predicting (~82k deaths) because it assumes we get the R0 below ~1 worldwide for an extended period of time. Very likely, it won’t be possible to contain the virus in the United States over the long-run because it is so widespread and contagious. It only takes a handful of infected people to ensure the population eventually becomes infected. Even if the United States gets it under control domestically, allowing in just a few infected outsiders would start the process again. It’ll take some time for it to burn through the third world, which should serve as a viral reservoir for several months at least.

    In the mean time, look for a lot of the bought-off water carriers for Donald Trump on the “Dissident” Right to act as controlled opposition. They’ll deny there’s a problem in video after YouTube video – all in the faint hope Donald Trump will get reelected. He won’t. YouTube will also continue allowing fringe conspiracy theorists like David Icke to post nonsense about how their isn’t even a virus in the first place — just 5G side effects or whatever. They’ll tolerate those guys because their rhetoric serves the greater purpose of playing down the virus.

    Years from now, someone will do a study and find that far more people died of Covid-19 in the United States than was initially reported, probably by some multiple like 8 or 9 (or higher). But it was all conveniently swept under the rug, so few people will hear of it.

    Afterwards, look for a post Covid-19 amnesty push along with increased “skills-based” immigration. The WaPo is already hinting at this with their coverage of so-called “Dreamers” in distress. Basically, they’ll let a lot of native-born Americans die and then quickly replace them with foreign workers. “The business of America is business”, after all. “The show must go on.” Oh, and Donald Trump will lose the 2020 election as some portion of his base dies off; the GOP additionally won’t be competitive in the future as the former base is replaced by immigrants who overwhelmingly vote democratic.

    The “Dissident” Right will also be discredited as they hysterically embraced muh economy / muh freedom over protecting the lives of their fellow citizen, all while downplaying the virus — that after claiming to be ethnic activists interested in their people’s well being (that’s an obvious fraud now). Exposed as grifters, Wall Street worshipers, and uneducated fringe conspiracy theorists, they’ll fade away with Donald Trump’s election loss. They were already pushing it with Qanon. But this is probably a step over the line for most. Basically, anyone who’s intelligent will stay away from this crowd in the aftermath, at least until a few new figures emerge and rebrand with a new philosophy immune to this nonsense. Deep State wins the 4D chess match — for now.

    Side note: the YouTuber in question has been delisted from Google search results despite being non-violent. Reason: They didn’t like his opinion on feminism and the like. Remember that the next time someone like Samantha Bee claims conservatives aren’t being censored. She’s right. Everyone is being censored. This guy is a progressive atheist. I guess they are trying to get his R0 below 1, the logic being that if they ban his videos from appearing in searches, his base will slowly fade away as he’s not able to pick up new subscribers. They can then deflect criticism by claiming they didn’t actually ban anyone. There is some irony in that considering circumstances.

    • Replies: @Divine Right

    the virus will continue spreading irrespective of what the models say until several hundred thousand or even a few million Americans die.
     
    This is probably also one of the stronger reasons not to have done a full-scale economic shutdown. By underplaying the virus, minimizers created a situation where they got the opposite of what they wanted when what they really should have said is something like the following: "Because we don't have immunity and because this new virus is so contagious, we can't ever hope to fully contain it -- not until we have a vaccine, which is probably 12 - 18 months away. We can't shut down things until then, so we don't have a choice. Maybe we can do a limited shutdown in stages as the virus spreads, but we can't just shut down everything at once because it won't do any good anyway. It will still spread, it will still hang around in the third world even if we stamp it out here, and the moment a single infected person from the outside gets in we'll have just maybe a month or two before the whole country gets it." I think that would have worked a lot better than what a lot of people ended up doing -- telling their Twitter followers to laugh at people wearing face coverings or trying to claim it's all just the flu or a conspiracy. The government not having enough PPE also scared people pretty badly and made it worse.
  24. @James Bowery
    r^2= 80% for Alcohol Consumed Per Drinker as of 1994

    These are COVID-19 deaths correlated with about 500 ecological variables, sorted by their coefficients of determination (r^2). The spreadsheet is at this link for those interested. Please note that if I had more time, I'd have delete the rows that are absolute counts (ie: not per capita). For those interested in what "VerbalChange1991to2001" means: It's the change in the SAT verbal scores.

    I’ve cleaned up the by-state correlations to get rid of the misleading alcohol data (N=11 and far from normally distributed) as well as the absolute count variables that were just basically adding confusion.

  25. If you look at my updated blog post, you’ll notice strong urban correlates at the top:

    Variable r r^2 N
    ImmigrantsDominicanRepublicPercapita1998 83.13% 69.10% 51
    HIVPositiveTestsPercapita2001 77.75% 60.45% 33
    JewsPercapita1999 75.56% 57.09% 51
    West_IndianPercapita1990 72.36% 52.35% 51
    ItalianPercentOfWhites 70.20% 49.28% 51
    RussianPercapita1990 69.29% 48.01% 51

    However, and this is interesting, InnerCityPercapita1990 is pretty far down by comparison:

    InnerCityPercapita1990 42.16% 17.77% 51

    Also notice ImmigrantsChina1998 is relatively low:

    ImmigrantsChina1998 43.83% 19.21% 51

  26. @James Bowery
    r^2= 80% for Alcohol Consumed Per Drinker as of 1994

    These are COVID-19 deaths correlated with about 500 ecological variables, sorted by their coefficients of determination (r^2). The spreadsheet is at this link for those interested. Please note that if I had more time, I'd have delete the rows that are absolute counts (ie: not per capita). For those interested in what "VerbalChange1991to2001" means: It's the change in the SAT verbal scores.

    That sort of data search is the key reason why reduced-form modelling rightly gets attacked. It’s “CompSci” statistics – easy to implement, but completely ignorant of the underlying statistical properties of what comes out.

    If you do 500 preliminary pairwise ρᵢⱼ , a bunch of them will be significant by chance – and since the dependent variable is truncated, the proportion of significant ρᵢⱼ will be greater than αN (where α is the significance level and N is the sample size).

    ρᵢⱼ is the ‘go-to’ mechanism for Pharma- and Psych-‘quant’ [sic] – which is why their ‘studies’ don’t replicate.

    Back when I taught Applied Econometric Modelling (a 3rd year undergrad subject), 20% of the first assignment was dedicated to checking if the kiddies properly understood things like spurious correlation, and to showing them why data-search was the same thing as what is now known as p-hacking (we called it ‘data dredging’ at the time: neither term is sufficiently pejorative – it should be called what it is: “corrupt quant“).

    Maybe 10% of the class really ‘got’ the concept – and these were kids that had to be in the 95th percentile of HS kiddies (to get into first year), and the course was an elective and required a ‘Credit’ in 3 precursor subjects: Statistics (1st yr); Econometric Theory (2nd yr); Applied Econometrics (2nd yr). (NB… ‘Credit’: 3rd grade in the range, but still an American ‘A’).

    <rant>

    American ‘A’ starts just above the median (the image below is as of 2016): at 4-year colleges it’s 42% of all grades. So ‘Straight A student‘ has been rendered effectively meaningless – it barely means ‘better than average‘.

    By contrast, at my alma mater, students with a Credit average over the first 3 years were eligible for the Honours year – but if they only scraped in and then attempted Honours, they would finish in the bottom quintile. At undergrad only the top decile got a Distinction in a subject, and only the top 1-5 students in a class got a High Distinction: an “HD average” was almost unheard-of.

    Pity the Monash BEc kids who had a ‘D’ average: be in the top couple of percent of the student body at a university ranked in the top few dozen in the world for Economics (#24 in 2013; #39 now)… get a ‘D’ average. Must be the most misunderstood cohort in the history of tertiary education.

    (I think Monash has moved to GPA now, to cater for the stupidity of HR-tards. Imagine some fuckwit HR-tard seeing a bunch of ‘D’ grades on a transcript).

    </rant>

    • Replies: @James Bowery
    Ever heard the term "exploratory data analysis"? The original title of my blog post was "Exploratory Data Analysis of COVID-19 Deaths By State" for good reason.

    If you want to lock horns with me on premature "publication" of said EDA, you win. If you want to lock horns with me on inference of causality, you'd better up your game a standard deviation or so.

    Ever hear of Algorithmic Information Theory?

  27. @Intelligent Dasein
    There aren't going to be any correlates with gross population data. None.

    The reason why is because viruses do not spread around at random the way everyone has been assuming. Rather, every step in the following process follows a Pareto distribution.

    1. Exposure: Were you exposed at all?

    2. If you were exposed, did you contract the virus?

    3. If you contracted, did you develop antibodies?

    4. If you developed antibodies, did you show any symptoms?

    5. If you showed symptoms, were they serious enough to attract medical attention?

    6. Did you die?

    By the time you get down to death at number 6, you're looking at people who were pretty much running towards a date with destiny at full tilt, not at people who are represented by a cross section of other variables. It's like asking who is going to win the track meet by measuring the vital signs of the people in the bleachers.

    It’s like asking who is going to win the track meet by measuring the vital signs of the people in the bleachers.

    Beautifully put.

    And the flip-side – the OK Doomer[1] model – is assuming that the people in the bleachers can run the same average time as the guys on the track.

    [1] Manfred Arcane used”OK Doomer” in a comment the other day; it’s much better than “Chicken Little“.

  28. @Kratoklastes
    That sort of data search is the key reason why reduced-form modelling rightly gets attacked. It's "CompSci" statistics - easy to implement, but completely ignorant of the underlying statistical properties of what comes out.

    If you do 500 preliminary pairwise ρᵢⱼ , a bunch of them will be significant by chance - and since the dependent variable is truncated, the proportion of significant ρᵢⱼ will be greater than αN (where α is the significance level and N is the sample size).

    ρᵢⱼ is the 'go-to' mechanism for Pharma- and Psych-'quant' [sic] - which is why their 'studies' don't replicate.

    Back when I taught Applied Econometric Modelling (a 3rd year undergrad subject), 20% of the first assignment was dedicated to checking if the kiddies properly understood things like spurious correlation, and to showing them why data-search was the same thing as what is now known as p-hacking (we called it 'data dredging' at the time: neither term is sufficiently pejorative - it should be called what it is: "corrupt quant").

    Maybe 10% of the class really 'got' the concept - and these were kids that had to be in the 95th percentile of HS kiddies (to get into first year), and the course was an elective and required a 'Credit' in 3 precursor subjects: Statistics (1st yr); Econometric Theory (2nd yr); Applied Econometrics (2nd yr). (NB... 'Credit': 3rd grade in the range, but still an American 'A').

    <rant>

    American 'A' starts just above the median (the image below is as of 2016): at 4-year colleges it's 42% of all grades. So 'Straight A student' has been rendered effectively meaningless - it barely means 'better than average'.

    https://www.timeshighereducation.com/sites/default/files/grades2.png

    By contrast, at my alma mater, students with a Credit average over the first 3 years were eligible for the Honours year - but if they only scraped in and then attempted Honours, they would finish in the bottom quintile. At undergrad only the top decile got a Distinction in a subject, and only the top 1-5 students in a class got a High Distinction: an "HD average" was almost unheard-of.

    Pity the Monash BEc kids who had a 'D' average: be in the top couple of percent of the student body at a university ranked in the top few dozen in the world for Economics (#24 in 2013; #39 now)... get a 'D' average. Must be the most misunderstood cohort in the history of tertiary education.

    (I think Monash has moved to GPA now, to cater for the stupidity of HR-tards. Imagine some fuckwit HR-tard seeing a bunch of 'D' grades on a transcript).

    </rant>

    Ever heard the term “exploratory data analysis“? The original title of my blog post was “Exploratory Data Analysis of COVID-19 Deaths By State” for good reason.

    If you want to lock horns with me on premature “publication” of said EDA, you win. If you want to lock horns with me on inference of causality, you’d better up your game a standard deviation or so.

    Ever hear of Algorithmic Information Theory?

    • Replies: @Kratoklastes
    Misrepresenting EDA by trying to conflate it with data-dredging, is not going to you anywhere.

    EDA is mostly about getting some insight into the characteristics of each series in a pre-specified model -
     • descriptive stats (e.g., mean; variance; skew; quantiles);
     • tests of unit roots and stationarity (i.e., constant mean; finite variance);
     • distributional stats (choice of distribution based on the descriptives).

    EDA's a perfectly normal part of building a model - given some hypothesised behavioural or theoretic relationship between some set of variables and some other set of variables.

    By contrast, data-dredging is the answer to "I have this thing, and data for a shitload of other things. Some of these other things might be [cor]related with my variable of interest... "

    That's not EDA: it's 'CompSci' (aka "HelloWorld") "quest for a model". It's "suck it and see", and ought to be considered naughty - or at the very least, subjected to a Bonferroni correction (i.e., requiring that the hypothesis be tested at the α/M significance, where M is the number of individual pairwise ρᵢⱼ performed).

    Tukey (who literally 'write the book' on EDA) was about understanding your data, not just chucking the kitchen sink at a problem and seeing what sticks, and calling the stickiest exogenous variable 'explanatory'.

    There's been plenty of recent that EDA is the fundamental root of the replication crisis (especially in Psych and Pharma): it's a well-understood problem of using atheoretic associations as the basis for claims about deep causation.

    Quant economics has not been able to get away with this sort of thing for 2 generations - and yet it remains ubiquitous elsewhere... it needs to die in a fire.

    Lucas (1976) correctly pointed out that reduced form evidence is fundamentally weak, and is a poor substitute for a quest for 'deep' parameters from a behavioural/structural model.

    This is something that econometricians and economic modellers already knew about in 1976 (Ramsay's RESET test for model specification was published in 1968; Chow's test was published in 1960 and was being used to test for parameter stability by the mid-60s)... but Lucas was a 'revelation' to Economics more broadly.

    <rant>

    It's absolutely true that by the mid-70s there was abundant, bad, reduced-form quant in economics (including, but not limited to, the Phillips Curve and Okun's Law).

    A lot of it was done by quant-dilettantes - hand-waving essay-writers who understood that the discipline was a little infatuated with quant, and not very interested in hand-waving any more. Quite a bit was done cynically by people who definitely knew better.

    Quant was a way to easy publications - so there was a rash of publications in quite-prestigious Economics journals (AER, especially) where people applied reduced-form OLS (but not to a standard that would get a pass in a 2nd year Econometrics subject).

    Well before this - half a decade before Lucas wrote this 'insight', and before Kydland & Prescott pretended that their ideas were novel - Leontief and his graduate students at Harvard were having a crack at generalising input-output models to develop dynamic, policy-invariant models based on deep structural foundations.

    One of Leontief's students (Dixon - my PhD supervisor) made clear the requirement for underlying structure in his 1972 PhD work (The Theory of Joint Maximization, which was published in the Amsterdam North Holland "Contributions to Economic Analysis" series in 1975).

    The same year as Lucas wrote, two men (the aforementioned Dixon, and Allan Powell) - who were very influential in my training - proposed (along with Vincent) a CRESH/CRETH production structure which was very specifically the result of a quest for 'deep' behaviour.

    </rant>

    [1] CRE[S|T]H: constant ratio of elasticities of [substitution|transformation], homothetic

  29. @Intelligent Dasein
    There aren't going to be any correlates with gross population data. None.

    The reason why is because viruses do not spread around at random the way everyone has been assuming. Rather, every step in the following process follows a Pareto distribution.

    1. Exposure: Were you exposed at all?

    2. If you were exposed, did you contract the virus?

    3. If you contracted, did you develop antibodies?

    4. If you developed antibodies, did you show any symptoms?

    5. If you showed symptoms, were they serious enough to attract medical attention?

    6. Did you die?

    By the time you get down to death at number 6, you're looking at people who were pretty much running towards a date with destiny at full tilt, not at people who are represented by a cross section of other variables. It's like asking who is going to win the track meet by measuring the vital signs of the people in the bleachers.

    Seriously and egregiously wrong both technically and in spirit.

    In spirit:

    When the epidemiology “authorities” are doing F.A. with their enormous resources, and people are running around in hysterics, it is up to people with limited resources (time, money, education and access to data) to do the best they can under those circumstances. That means exploratory data analysis that _should_ have been done by those with the resources is the only way one might plausibly shame the authorities into doing their job. Criticize the specifics about the efforts, yes. That’s in the spirit of the situation.

    Technically:

    What you call “gross population data” are “ecological data”. While the persnickety will attempt to score points with critiques of “the ecological fallacy” (which is at least a cut above the Stat 101 “Correlation doesn’t imply causation!”) ecological correlations are fundamental to epidemiology going back to its origins. Chollera, for example, could not have been detected as water borne without the ecological correlations of wells and outbreaks. It may ruffle the feathers of the pedants to see people “wasting pristine data” by simply looking at it without testing any hypothesis, but when it comes to public health, we’re dealing in ecological relationships, so T.S. Sometimes these relationships are obvious, like population density and one might try to get away scoring pedantic points by saying it is uninformative to find such correlations as justification of the general critique of exploratory data analysis. But this is mere confirmation bias that ignores the possibility — quite distinct — that a supposedly “spurious” correlation may turn out to contain within it a clue to a previously unnoticed latent variable that just might save lives.

  30. @Jtgw
    Population density should be a factor. All else equal Is expect more penetration in population of a million concentrated in a city than same population spread out over vast area.

    One of the really hard things to get right is driven by the absolutely obvious age-cohort partition in hospitalisation, severity, and mortality data.

    It seems that infection rate is (very) roughly constant across age groups (ex the very young) – although it must be borne in mind that to be counted in I you have to have been tested, which is a very biased sample in the US.

    It’s like having another two compartments in a SEIR model:

    S → E → I → TH → R/D

    where there are additional transition probabilities of going from ‘Infected’ to ‘Infected, with symptoms severe enough to generate a Test‘, and some subset of those require Hospitalisation, of which some Die.

    That doesn’t mean that Pr(I|E) is constant across age groups, though: relatively-more-robust 20-29 year olds probably have a lower chance of becoming infected for a given level of exposure.

    The excess infection rate in 20-29 year olds (150% of their population share, in Australian data) probably indicates that these people have the most social interaction – and most of their social interaction will be within-group.

    Conversely, the average ill person over 70 – by far the most-over-represented in fatalities-with covid19 – has a much narrower (and less varied) group of social contacts: in the normal course of events it’s family or nursing home staff – if one person on staff is asymp, they can infect tens of the inmates of a nursing home.

    Modelling different age-group transition probabilities is proving a challenge, because the age profile of the contacts of a each age group will be distributed differently – and the numbers in I, T, H, and D in the 70+ age groups can result from an exposure from the contacts of a 20-29yo (and vice versa).

  31. @Divine Right

    Though we came in for some derision in predicting a couple of weeks ago that the death toll would come in under 100,000 nationwide, it now looks like that’s the way to bet–at least in the pestilence’s first wave.
     
    Here’s a pretty good take on the subject. TL;DR: the virus will continue spreading irrespective of what the models say until several hundred thousand or even a few million Americans die. The author examines the potential economic impact and concludes that it’s better to sacrifice some portion of the economy in the short term rather than letting people die en masse, which would be a permanent detriment to economic growth. However, I don’t totally agree. This analysis does not take into account Western immigration policies.

    Coronavirus: Can it even be stopped?

    https://www.youtube.com/watch?v=0ERi2cL730o

    Here’s what the U.S. ruling class is likely to do next (my take):

    They’ll subtly ramp down testing and reporting through a combination of withdrawing federal funds for testing centers and bullying the press through willful misinterpretation of scientific data.

    They are already doing the former.

    Federal Support Ends For Coronavirus Testing Sites As Pandemic Peak Nears

    https://www.npr.org/sections/coronavirus-live-updates/2020/04/08/829955099/federal-support-for-coronavirus-testing-sites-end-as-peak-nears
     
    They’ll then use the current model, which accounts for social distancing / lockdowns, to declare victory over the virus; reaching the peak of this model will be equated to defeating it, justifying a return to business as usual. They’ll also say the model predicts just slightly more than annual reported flu deaths and berate previous critics as "hysterical" or maybe even as conspiracy theorists in order to discourage any future reporting on the subject when the virus inevitably comes back in a second wave (which will mostly go unreported, at least until after the election).

    They’ll allow the virus to burn through the population, which has no immunity. The end result will be far more deaths than what this model is currently predicting (~82k deaths) because it assumes we get the R0 below ~1 worldwide for an extended period of time. Very likely, it won’t be possible to contain the virus in the United States over the long-run because it is so widespread and contagious. It only takes a handful of infected people to ensure the population eventually becomes infected. Even if the United States gets it under control domestically, allowing in just a few infected outsiders would start the process again. It’ll take some time for it to burn through the third world, which should serve as a viral reservoir for several months at least.

    In the mean time, look for a lot of the bought-off water carriers for Donald Trump on the "Dissident" Right to act as controlled opposition. They’ll deny there’s a problem in video after YouTube video – all in the faint hope Donald Trump will get reelected. He won't. YouTube will also continue allowing fringe conspiracy theorists like David Icke to post nonsense about how their isn't even a virus in the first place -- just 5G side effects or whatever. They'll tolerate those guys because their rhetoric serves the greater purpose of playing down the virus.

    Years from now, someone will do a study and find that far more people died of Covid-19 in the United States than was initially reported, probably by some multiple like 8 or 9 (or higher). But it was all conveniently swept under the rug, so few people will hear of it.

    Afterwards, look for a post Covid-19 amnesty push along with increased “skills-based” immigration. The WaPo is already hinting at this with their coverage of so-called "Dreamers" in distress. Basically, they’ll let a lot of native-born Americans die and then quickly replace them with foreign workers. “The business of America is business”, after all. "The show must go on." Oh, and Donald Trump will lose the 2020 election as some portion of his base dies off; the GOP additionally won’t be competitive in the future as the former base is replaced by immigrants who overwhelmingly vote democratic.

    The "Dissident" Right will also be discredited as they hysterically embraced muh economy / muh freedom over protecting the lives of their fellow citizen, all while downplaying the virus -- that after claiming to be ethnic activists interested in their people's well being (that's an obvious fraud now). Exposed as grifters, Wall Street worshipers, and uneducated fringe conspiracy theorists, they'll fade away with Donald Trump's election loss. They were already pushing it with Qanon. But this is probably a step over the line for most. Basically, anyone who's intelligent will stay away from this crowd in the aftermath, at least until a few new figures emerge and rebrand with a new philosophy immune to this nonsense. Deep State wins the 4D chess match -- for now.

    Side note: the YouTuber in question has been delisted from Google search results despite being non-violent. Reason: They didn’t like his opinion on feminism and the like. Remember that the next time someone like Samantha Bee claims conservatives aren’t being censored. She’s right. Everyone is being censored. This guy is a progressive atheist. I guess they are trying to get his R0 below 1, the logic being that if they ban his videos from appearing in searches, his base will slowly fade away as he’s not able to pick up new subscribers. They can then deflect criticism by claiming they didn't actually ban anyone. There is some irony in that considering circumstances.

    the virus will continue spreading irrespective of what the models say until several hundred thousand or even a few million Americans die.

    This is probably also one of the stronger reasons not to have done a full-scale economic shutdown. By underplaying the virus, minimizers created a situation where they got the opposite of what they wanted when what they really should have said is something like the following: “Because we don’t have immunity and because this new virus is so contagious, we can’t ever hope to fully contain it — not until we have a vaccine, which is probably 12 – 18 months away. We can’t shut down things until then, so we don’t have a choice. Maybe we can do a limited shutdown in stages as the virus spreads, but we can’t just shut down everything at once because it won’t do any good anyway. It will still spread, it will still hang around in the third world even if we stamp it out here, and the moment a single infected person from the outside gets in we’ll have just maybe a month or two before the whole country gets it.” I think that would have worked a lot better than what a lot of people ended up doing — telling their Twitter followers to laugh at people wearing face coverings or trying to claim it’s all just the flu or a conspiracy. The government not having enough PPE also scared people pretty badly and made it worse.

  32. One of the hard-hit towns in Italy tested everyone: Not that many antibody-havers were found. So your happy scenario is not looking good.

    Follow Chris Martenson’s video posts. He is zeroed in on this issue.

    BTW, there was some bad news about the vaccine: Corona antibodies are pretty weak when it comes to fighting off re-infection. So the virus is more like HIV – not a great vaccine target.

    • Replies: @Justvisiting
    I am also a big fan of Martenson--the link is here:

    https://www.youtube.com/channel/UCD2-QVBQi48RRQTD4Jhxu8w

    He predicted the shortages and panic and lockdowns--and got me prepared in the "quiet time" long before anyone was paying attention to CV stuff.

    That does not guarantee that the rest of his analysis of the virus is accurate--but I trust him more than the zillions of Internet posters who have no three month track record of accurate CV predictions to call their own.
  33. @dvorak
    One of the hard-hit towns in Italy tested everyone: Not that many antibody-havers were found. So your happy scenario is not looking good.

    Follow Chris Martenson's video posts. He is zeroed in on this issue.

    BTW, there was some bad news about the vaccine: Corona antibodies are pretty weak when it comes to fighting off re-infection. So the virus is more like HIV - not a great vaccine target.

    I am also a big fan of Martenson–the link is here:

    https://www.youtube.com/channel/UCD2-QVBQi48RRQTD4Jhxu8w

    He predicted the shortages and panic and lockdowns–and got me prepared in the “quiet time” long before anyone was paying attention to CV stuff.

    That does not guarantee that the rest of his analysis of the virus is accurate–but I trust him more than the zillions of Internet posters who have no three month track record of accurate CV predictions to call their own.

  34. @James Bowery
    Ever heard the term "exploratory data analysis"? The original title of my blog post was "Exploratory Data Analysis of COVID-19 Deaths By State" for good reason.

    If you want to lock horns with me on premature "publication" of said EDA, you win. If you want to lock horns with me on inference of causality, you'd better up your game a standard deviation or so.

    Ever hear of Algorithmic Information Theory?

    Misrepresenting EDA by trying to conflate it with data-dredging, is not going to you anywhere.

    EDA is mostly about getting some insight into the characteristics of each series in a pre-specified model –
     • descriptive stats (e.g., mean; variance; skew; quantiles);
     • tests of unit roots and stationarity (i.e., constant mean; finite variance);
     • distributional stats (choice of distribution based on the descriptives).

    EDA’s a perfectly normal part of building a model – given some hypothesised behavioural or theoretic relationship between some set of variables and some other set of variables.

    By contrast, data-dredging is the answer to “I have this thing, and data for a shitload of other things. Some of these other things might be [cor]related with my variable of interest… ”

    That’s not EDA: it’s ‘CompSci’ (aka “HelloWorld“) “quest for a model“. It’s “suck it and see“, and ought to be considered naughty – or at the very least, subjected to a Bonferroni correction (i.e., requiring that the hypothesis be tested at the α/M significance, where M is the number of individual pairwise ρᵢⱼ performed).

    Tukey (who literally ‘write the book’ on EDA) was about understanding your data, not just chucking the kitchen sink at a problem and seeing what sticks, and calling the stickiest exogenous variable ‘explanatory’.

    There’s been plenty of recent that EDA is the fundamental root of the replication crisis (especially in Psych and Pharma): it’s a well-understood problem of using atheoretic associations as the basis for claims about deep causation.

    Quant economics has not been able to get away with this sort of thing for 2 generations – and yet it remains ubiquitous elsewhere… it needs to die in a fire.

    Lucas (1976) correctly pointed out that reduced form evidence is fundamentally weak, and is a poor substitute for a quest for ‘deep’ parameters from a behavioural/structural model.

    This is something that econometricians and economic modellers already knew about in 1976 (Ramsay’s RESET test for model specification was published in 1968; Chow’s test was published in 1960 and was being used to test for parameter stability by the mid-60s)… but Lucas was a ‘revelation’ to Economics more broadly.

    <rant>

    It’s absolutely true that by the mid-70s there was abundant, bad, reduced-form quant in economics (including, but not limited to, the Phillips Curve and Okun’s Law).

    A lot of it was done by quant-dilettantes – hand-waving essay-writers who understood that the discipline was a little infatuated with quant, and not very interested in hand-waving any more. Quite a bit was done cynically by people who definitely knew better.

    Quant was a way to easy publications – so there was a rash of publications in quite-prestigious Economics journals (AER, especially) where people applied reduced-form OLS (but not to a standard that would get a pass in a 2nd year Econometrics subject).

    Well before this – half a decade before Lucas wrote this ‘insight’, and before Kydland & Prescott pretended that their ideas were novel – Leontief and his graduate students at Harvard were having a crack at generalising input-output models to develop dynamic, policy-invariant models based on deep structural foundations.

    One of Leontief’s students (Dixon – my PhD supervisor) made clear the requirement for underlying structure in his 1972 PhD work (The Theory of Joint Maximization, which was published in the Amsterdam North Holland “Contributions to Economic Analysis” series in 1975).

    The same year as Lucas wrote, two men (the aforementioned Dixon, and Allan Powell) – who were very influential in my training – proposed (along with Vincent) a CRESH/CRETH production structure which was very specifically the result of a quest for ‘deep’ behaviour.

    </rant>

    [1] CRE[S|T]H: constant ratio of elasticities of [substitution|transformation], homothetic

    • Replies: @James Bowery
    EDA is mostly about getting some insight into the characteristics of each series in a pre-specified model

    No it's not. While EDA can be done in the context of a class of models (indeed, informally, it is impossible to do otherwise as we all have our informal priors) it is not limited to that context.

    Tukey (who literally ‘write the book’ on EDA) was about understanding your data

    Oh good grief. Stop trying to score pedantry points. John Tukey was mentor of my colleague* Charles Sinclair Smith, who went on to co-found the Energy Information Administration under Carter, and was responsible for setting up the initial data analysis procedures. He then went to direct the Systems Development Foundation where he funded the neural network PDP books, Hinton and Werbos, among others, thereby ending the first neural network winter. I'm perfectly aware of the limitations of "naughty" EDA. If you want to be helpful, how about looking at my spreadsheet and providing some additional EDA.

    You could start with Bonferroni corrections, since you brought that up.

    * Charlie and I go back to the mid-90s but more recently (2007) I moved to the next county over in rural Iowa from where he was semi-retired, and we entered into a decade-long back and forth about causal inference. This was fortuitous as in 2006 I had convinced Marcus Hutter to fund a prize for lossless compression of data that was recently in the news again. My position was then, and remains today, that lossless compression of data yields the best information criterion for model selection, and that encompasses causal models. I finally got Charlie to come around to my point of view. Your entire world of "statistics" is being up-ended by guys like me and you're treating me like a kid. Cut it out and apply yourself to the problem at hand.
  35. @Mark G.
    The five states that have yet to issue shelter-in-place orders may want to reconsider. A new CDC report says the virus can travel 13 feet:

    https://nypost.com/2020/04/12/the-coronavirus-can-travel-at-least-13-feet-new-study-shows/

    The other states shut pretty much everything down but everyone has been standing six feet away from each other in long grocery store lines for the last month so they can get food and not starve to death. This type of shutdown may not be as effective as previously thought so we may have destroyed the economy to achieve less results in stopping the spread of the disease than we thought.

    Mark, 13 feet? Not 27? not 14 or 12 feet? Is 6 feet safe? Inside or out? With or without the wind? Does a plume of droplets hang in the air (still air? Completely still??) like smoke from a campfire?

    I call BS on all of it. It is a variety of the flu. Kills some people. BFD. Might kill me, I suppose. Everybody dies of something. I never expected the idiots running this nation to shut it down because they gave a s%it if I died, and I don’t think so now. This is a scam being used to depress the people and enable a takeover that would not otherwise be as easy for the psychos working it.

  36. @Kratoklastes
    Misrepresenting EDA by trying to conflate it with data-dredging, is not going to you anywhere.

    EDA is mostly about getting some insight into the characteristics of each series in a pre-specified model -
     • descriptive stats (e.g., mean; variance; skew; quantiles);
     • tests of unit roots and stationarity (i.e., constant mean; finite variance);
     • distributional stats (choice of distribution based on the descriptives).

    EDA's a perfectly normal part of building a model - given some hypothesised behavioural or theoretic relationship between some set of variables and some other set of variables.

    By contrast, data-dredging is the answer to "I have this thing, and data for a shitload of other things. Some of these other things might be [cor]related with my variable of interest... "

    That's not EDA: it's 'CompSci' (aka "HelloWorld") "quest for a model". It's "suck it and see", and ought to be considered naughty - or at the very least, subjected to a Bonferroni correction (i.e., requiring that the hypothesis be tested at the α/M significance, where M is the number of individual pairwise ρᵢⱼ performed).

    Tukey (who literally 'write the book' on EDA) was about understanding your data, not just chucking the kitchen sink at a problem and seeing what sticks, and calling the stickiest exogenous variable 'explanatory'.

    There's been plenty of recent that EDA is the fundamental root of the replication crisis (especially in Psych and Pharma): it's a well-understood problem of using atheoretic associations as the basis for claims about deep causation.

    Quant economics has not been able to get away with this sort of thing for 2 generations - and yet it remains ubiquitous elsewhere... it needs to die in a fire.

    Lucas (1976) correctly pointed out that reduced form evidence is fundamentally weak, and is a poor substitute for a quest for 'deep' parameters from a behavioural/structural model.

    This is something that econometricians and economic modellers already knew about in 1976 (Ramsay's RESET test for model specification was published in 1968; Chow's test was published in 1960 and was being used to test for parameter stability by the mid-60s)... but Lucas was a 'revelation' to Economics more broadly.

    <rant>

    It's absolutely true that by the mid-70s there was abundant, bad, reduced-form quant in economics (including, but not limited to, the Phillips Curve and Okun's Law).

    A lot of it was done by quant-dilettantes - hand-waving essay-writers who understood that the discipline was a little infatuated with quant, and not very interested in hand-waving any more. Quite a bit was done cynically by people who definitely knew better.

    Quant was a way to easy publications - so there was a rash of publications in quite-prestigious Economics journals (AER, especially) where people applied reduced-form OLS (but not to a standard that would get a pass in a 2nd year Econometrics subject).

    Well before this - half a decade before Lucas wrote this 'insight', and before Kydland & Prescott pretended that their ideas were novel - Leontief and his graduate students at Harvard were having a crack at generalising input-output models to develop dynamic, policy-invariant models based on deep structural foundations.

    One of Leontief's students (Dixon - my PhD supervisor) made clear the requirement for underlying structure in his 1972 PhD work (The Theory of Joint Maximization, which was published in the Amsterdam North Holland "Contributions to Economic Analysis" series in 1975).

    The same year as Lucas wrote, two men (the aforementioned Dixon, and Allan Powell) - who were very influential in my training - proposed (along with Vincent) a CRESH/CRETH production structure which was very specifically the result of a quest for 'deep' behaviour.

    </rant>

    [1] CRE[S|T]H: constant ratio of elasticities of [substitution|transformation], homothetic

    EDA is mostly about getting some insight into the characteristics of each series in a pre-specified model

    No it’s not. While EDA can be done in the context of a class of models (indeed, informally, it is impossible to do otherwise as we all have our informal priors) it is not limited to that context.

    Tukey (who literally ‘write the book’ on EDA) was about understanding your data

    Oh good grief. Stop trying to score pedantry points. John Tukey was mentor of my colleague* Charles Sinclair Smith, who went on to co-found the Energy Information Administration under Carter, and was responsible for setting up the initial data analysis procedures. He then went to direct the Systems Development Foundation where he funded the neural network PDP books, Hinton and Werbos, among others, thereby ending the first neural network winter. I’m perfectly aware of the limitations of “naughty” EDA. If you want to be helpful, how about looking at my spreadsheet and providing some additional EDA.

    You could start with Bonferroni corrections, since you brought that up.

    * Charlie and I go back to the mid-90s but more recently (2007) I moved to the next county over in rural Iowa from where he was semi-retired, and we entered into a decade-long back and forth about causal inference. This was fortuitous as in 2006 I had convinced Marcus Hutter to fund a prize for lossless compression of data that was recently in the news again. My position was then, and remains today, that lossless compression of data yields the best information criterion for model selection, and that encompasses causal models. I finally got Charlie to come around to my point of view. Your entire world of “statistics” is being up-ended by guys like me and you’re treating me like a kid. Cut it out and apply yourself to the problem at hand.

  37. As for causal inference, I suggest you familiarize yourself with Hector Zenil’s work recently published in Nature:

    Causal deconvolution by algorithmic generative models“.

    Perhaps this little video will help you understand the difference between this approach to big data and “statistics”, but it’s not the video I would have made since it doesn’t describe the relationship with statistical methods.

    The way lossless compression dispenses with “spurious correlations” is not unlike the way path analysis does: A spurious correlation is exposed when the graph complexity can be reduced by eliminating the corresponding path without increasing the overall error. The key to understanding the AIT approach is understanding that the units of error are to be brought into the same dimension as the units of model complexity: bits, or more specifically, the bits comprising the algorithm that generates the data being modeled.

  38. Anything else come to mind as a potentially meaningful correlate, or is looking at the state level hopelessly imprecise?

    Probably just too much noise at this stage. Too many confounds for a signal to show. Maybe if the disease progresses more towards saturation.

    I agree with Doctor Hess, and expect latitude (vitamin D) to eventually show, especially with those of African heritage.

  39. In recognition that the statistically naive should to keep these correlations in perspective, I’ve updated the intro to the table:

    Sunday, April 12, 2020
    Exploratory Data Analysis of COVID-19 Deaths By State
    These correlations are with (the log of) COVID-19 deaths per capita, sorted by their coefficients of determination (r^2). The spreadsheet is at this link for those interested.

    A couple of caveats for the statistically naive:

    First, the top correlations are not reliable as evidence of a real relationship. (However, the presence of a correlation at the bottom is reasonably taken as evidence for the absence of a relationship.) This is because when randomly searching a large number of variables for correlations, some high correlations are bound to appear just by chance. While the chance of this decreases with increasing N, there are only 50 States (plus DC), and there are many more than 50 variables being compared. The main value of this kind of “data dredging” is to bring to the surface correlations that might not be “fools gold” but must be put through additional screening. An example of additional screening might consist of going to the county level data to increase N to the thousands. This can help screen out spurious correlations. There are many other things that can be done. If you have the time and inclination to do so, more power to you.

    Second, variables that are absolute counts don’t really belong in this table. They should be divided by population to yield per capita variables. For example “Italian1990”, “ItalianPercentOfWhites” and “ItalianPercapita1990” all appear in high correlations. “Italian1990” should probably not be included in this table as it is an absolute count.

  40. BTW, AE, this unpleasantness involving the likes of and relates to what I’ve been telling you (and others of the numerate right) about lossless compression for some time now. If the realist right doesn’t start getting real about lossless compression as selection criterion for unified models of society, it is going to miss a fantastic opportunity to escape the rhetorical sewage of “social science” discourse, quite possibly missing an opportunity to prevent a thirty years war.

    How long is it going to take ya’ll to come to grips with the problem so we can nuke the social pseudosciences?

    • Replies: @Mr. Rational

    If the realist right doesn’t start getting real about lossless compression as selection criterion for unified models of society, it is going to miss a fantastic opportunity to escape the rhetorical sewage of “social science” discourse, quite possibly missing an opportunity to prevent a thirty years war.
     
    We're already past that.  The strongest association with social outcomes is race (ask Pat Boyle about that, if he ever shows up at SBPDL again) and any use of race as a predictor was anathema by the 1970's.  There will be no meeting of minds on this subject as (((they))) have prohibited it.

    I admit, it's going to be interesting to watch the armies of black and brown thugs come up against good ol' boys who've been shooting all their lives and see just how fast they are routed as their iffy-on-a-good-day discipline dissolves under fire.  Dunno how much of it I'll live to see, but the video that gets out is gonna be wild.

  41. Question about the meaning of statistical significance for ecological correlations.

    When we are looking at the 50 states, N = 50 (51, if DC is included). An online calculator gives .28 as the minimum value of Pearson’s r for significance at the .05 level (for both N = 50 and N = 51), which I still recall from some work I did on state-level correlations about a decade ago.

    At the time, I recall reading somewhere that significance levels do not apply in this case, because they have to do with inferences from a sample to a population. However, in this case there is no such inference, because the sample *is* the population.

    Another interpretation is that there is such an inference, the states being a sample from a hypothetical population, so significance levels are relevant.

    Which of these two interpretations do the statistically astute in here favor?

  42. @James Bowery
    BTW, AE, this unpleasantness involving the likes of @Kratoklastes and @Intelligent Dasein relates to what I've been telling you (and others of the numerate right) about lossless compression for some time now. If the realist right doesn't start getting real about lossless compression as selection criterion for unified models of society, it is going to miss a fantastic opportunity to escape the rhetorical sewage of "social science" discourse, quite possibly missing an opportunity to prevent a thirty years war.

    How long is it going to take ya'll to come to grips with the problem so we can nuke the social pseudosciences?

    If the realist right doesn’t start getting real about lossless compression as selection criterion for unified models of society, it is going to miss a fantastic opportunity to escape the rhetorical sewage of “social science” discourse, quite possibly missing an opportunity to prevent a thirty years war.

    We’re already past that.  The strongest association with social outcomes is race (ask Pat Boyle about that, if he ever shows up at SBPDL again) and any use of race as a predictor was anathema by the 1970’s.  There will be no meeting of minds on this subject as (((they))) have prohibited it.

    I admit, it’s going to be interesting to watch the armies of black and brown thugs come up against good ol’ boys who’ve been shooting all their lives and see just how fast they are routed as their iffy-on-a-good-day discipline dissolves under fire.  Dunno how much of it I’ll live to see, but the video that gets out is gonna be wild.

    • Replies: @James Bowery
    The front line of the discourse war isn't with those who continue to spout sophistry but the students of said sophists -- students who are increasingly literate in the principles of machine learning and increasingly motivated to get it right by virtue of the salaries being offered. While it is true that an enormous amount of "dumb money" is flowing out of rent seeking network effect monopolies to create 120dB of noise in the field, there is enough happening that some of the money is going to support perceptive minds -- just by chance. This is sort of what happened with the DotCon era's blizzard of dumb money. A little of it landed in the hands of guys like Bezos and Musk who then took it to shame the NASA technosocialists into following the law my coalition got passed in 1991.

    That's all it will take because in a deafening signal to noise roar, every increase in signal is equivalent to a large suppression of noise.
  43. @Mr. Rational

    If the realist right doesn’t start getting real about lossless compression as selection criterion for unified models of society, it is going to miss a fantastic opportunity to escape the rhetorical sewage of “social science” discourse, quite possibly missing an opportunity to prevent a thirty years war.
     
    We're already past that.  The strongest association with social outcomes is race (ask Pat Boyle about that, if he ever shows up at SBPDL again) and any use of race as a predictor was anathema by the 1970's.  There will be no meeting of minds on this subject as (((they))) have prohibited it.

    I admit, it's going to be interesting to watch the armies of black and brown thugs come up against good ol' boys who've been shooting all their lives and see just how fast they are routed as their iffy-on-a-good-day discipline dissolves under fire.  Dunno how much of it I'll live to see, but the video that gets out is gonna be wild.

    The front line of the discourse war isn’t with those who continue to spout sophistry but the students of said sophists — students who are increasingly literate in the principles of machine learning and increasingly motivated to get it right by virtue of the salaries being offered. While it is true that an enormous amount of “dumb money” is flowing out of rent seeking network effect monopolies to create 120dB of noise in the field, there is enough happening that some of the money is going to support perceptive minds — just by chance. This is sort of what happened with the DotCon era’s blizzard of dumb money. A little of it landed in the hands of guys like Bezos and Musk who then took it to shame the NASA technosocialists into following the law my coalition got passed in 1991.

    That’s all it will take because in a deafening signal to noise roar, every increase in signal is equivalent to a large suppression of noise.

    • Replies: @Mr. Rational

    there is enough happening that some of the money is going to support perceptive minds — just by chance.
     
    And they will be cancelled just as quickly as James Damore as soon as they dare to say something un-PC, or release a "racist" product.
  44. There is a ton of stuff to chew on here. I’m going to devote the requisite time to addressing it this weekend.

  45. @James Bowery
    The front line of the discourse war isn't with those who continue to spout sophistry but the students of said sophists -- students who are increasingly literate in the principles of machine learning and increasingly motivated to get it right by virtue of the salaries being offered. While it is true that an enormous amount of "dumb money" is flowing out of rent seeking network effect monopolies to create 120dB of noise in the field, there is enough happening that some of the money is going to support perceptive minds -- just by chance. This is sort of what happened with the DotCon era's blizzard of dumb money. A little of it landed in the hands of guys like Bezos and Musk who then took it to shame the NASA technosocialists into following the law my coalition got passed in 1991.

    That's all it will take because in a deafening signal to noise roar, every increase in signal is equivalent to a large suppression of noise.

    there is enough happening that some of the money is going to support perceptive minds — just by chance.

    And they will be cancelled just as quickly as James Damore as soon as they dare to say something un-PC, or release a “racist” product.

    • Replies: @James Bowery
    Whiz kids won't have to say anything if there is prize money attached to incremental improvements in the lossless compression of a wide range of longitudinal social measures. They'll just submit better unified models of society, and enjoy both a bank deposit and bragging rights for their model. The Fortune 500 thought police will have to dig into the code to find out the transgressive employee"cheated" by using a model that corresponds to reality. For the most part, those thought police are too stupid to do that -- but other kids eager to prove their chops will be smart enough. Yes, of course, some of them will talk about what some Google whiz kid used to better model society, but during interrogation the employee can just use some mealy mouthed race-denialist rhetoric like: "Oh, of course I was just exposing a latent variable some toothless yahoos identify as the folksonomy called 'race'. But we of The Future are all on board with Ingosc!"
  46. Coronavirus is a respiratory disease. Colder temperatures will let the virus spread faster. We’re having a rather cold April so the numbers are not quite at peak. As temperatures go up, the Sun will aid in boosting Vitamin D levels which affects the immune system. Taking vitamins like C and D is a good idea to help your immune system. Getting them in food is better than pills.

    Trying to find a magic correlation here is impossible, no matter the data set. This is a new disease and all the parameters of prognosis are still mostly unknown. They say it came from bats, so the bat’s environment is probably a factor as well to consider.

    • Agree: Mark G.
  47. @Mr. Rational

    there is enough happening that some of the money is going to support perceptive minds — just by chance.
     
    And they will be cancelled just as quickly as James Damore as soon as they dare to say something un-PC, or release a "racist" product.

    Whiz kids won’t have to say anything if there is prize money attached to incremental improvements in the lossless compression of a wide range of longitudinal social measures. They’ll just submit better unified models of society, and enjoy both a bank deposit and bragging rights for their model. The Fortune 500 thought police will have to dig into the code to find out the transgressive employee”cheated” by using a model that corresponds to reality. For the most part, those thought police are too stupid to do that — but other kids eager to prove their chops will be smart enough. Yes, of course, some of them will talk about what some Google whiz kid used to better model society, but during interrogation the employee can just use some mealy mouthed race-denialist rhetoric like: “Oh, of course I was just exposing a latent variable some toothless yahoos identify as the folksonomy called ‘race’. But we of The Future are all on board with Ingosc!”

  48. @res
    How about average dew point. Here is data for the states.
    https://www.forbes.com/sites/brianbrettschneider/2018/08/23/oh-the-humidity-why-is-alaska-the-most-humid-state/

    Would probably be better to look at average dew points in February, March, and April, but that data is harder to gather.

    Or temperature. Here are average temperatures and average winter temperatures.
    https://www.currentresults.com/Weather/US/average-annual-state-temperatures.php
    https://www.currentresults.com/Weather/US/average-state-temperatures-in-winter.php

    Or precipitation.
    https://www.currentresults.com/Weather/US/average-annual-state-precipitation.php
    Again, seasonal would probably be better.

    On another note, how about personality traits? This paper has a table of the big 5 by state on the last page. These might be interesting to use for other purposes as well.
    https://www.apa.org/news/press/releases/2013/10/regions-personalities
    https://www.apa.org/pubs/journals/releases/psp-a0034434.pdf

    Shelter in place order as a binary variable and date of initial/strongest order seem like possibly useful variables.

    P.S. Have you (AE) tried any multivariate regressions? It would be interesting to see something like race (4 category variable), obesity, density, dew point, and extroversion. Then some analysis to see how they work together (ANOVA?).

    P.P.S. Agreed that states are rather imprecise. Seems like major cities would be better. Especially since right now they seem to be driving the spread.

    I’ve not. The toolpak I use for multivariate analysis isn’t currently working. I’ll get another one if it’s not fixed by the time the dust has settled but I’m going to wait until then to post anything else. It feels like it could be more obfuscatory than clarifying at this point with the numbers bouncing around like they are.

  49. @NewTunesForOldLogos
    How about public transportation usage? A quick glance at the chart here looks encouraging:
    https://en.m.wikipedia.org/wiki/List_of_U.S._cities_with_high_transit_ridership

    Indeed there does look like there could be a lot of potential here. There are about 300 cities in the US with populations of 100k or more. This is bookmarked for the future, thanks.

  50. @res
    Thanks! The alcohol variable at the top is much less impressive given that it is only for 11 states though.

    But there are quite a few variables with r^2 of around 50% or above. I am stunned that
    Jew/Italian/Dominican/Russian/Puerto Rican/Jamaican all correlate so well across all 50 states. Abortion is another interesting variable, but only has data for 44 states.

    What I don't understand is how those did not show up more strongly in AE's data. AE saw correlations (r!) of less than 0.2! I would have expected Clinton and population density to be decent proxies for those demographic variables above.

    AE, could you see how this fits in with your results? Is he using similar death data? Could you try consolidating your data set with his or post your data here?

    P.S. DanHessinMD, are you planning on rolling your data into that spreadsheet? Or should I do it if you and AE don't?

Comments are closed.

Subscribe to All Audacious Epigone Comments via RSS