1. Rory

    Rory Senior Member

    Often described as an old wives' tale, the idea of a connection between the phase of the moon and an increase in birth rates still persists today, including among those who work in childbirth, as exemplified in this article at The Huffington Post:
    While some early studies appeared to support the idea, later large scale analyses - one looking at over half a million births - found no correlation between birth dates and any particular phase of the moon.

    The study using the largest amount of data, however - some 50 millons births - did propose a correlation, concluding that:
    Despite contradicting other analyses, as well as scientific understanding, given that the source of the study was the Astrophysics Department at Appalachian State University, it seemed worth looking at again. So...

    I downloaded 21 years worth of daily birth data, totalling some 89.5 million births, and plotted this against the phase of the moon. While I didn't think it strictly necessary, Caton and Wheatley had performed some data cleansing to eradicate anomalies such as weekends and holidays (substantially lower rates) and Tuesdays (higher rates), so I did this as well (as expected, with such a huge supply of data over a large timescale, it made no significant difference).

    These are the results:

    upload_2018-12-5_0-32-4. upload_2018-12-4_21-57-28.
    Key: Day 0=new moon; average births per day excluding weekends, holidays and Tuesdays; 13500/+1 in chart=full moon, 11500/-1=new moon; second chart repeats to better represent the lunar cycle

    Some conclusions:
    • There is no pattern or correlation with the lunar cycle
    • The largest daily variation from average is only 0.74%
    • The majority of points lie less than 0.21% from average
    • There is no significant increase or decrease in any phase of the moon
    • Births on the full moon, and in the three-day period around the full moon, are almost exactly average (-0.03 and +0.14% respectively)
    • There are no 'peak' or 'minimum' birth rates, just very slight random variations, as would be expected
    This, I believe, is the largest analysis to date on the subject, and, I would imagine, pretty conclusive.

    If anything further were needed, though, I did find - after completing everything - that Caton and Wheatley had followed up their earlier paper and run an improved model using data for around 70 million births.

    This time they found no correlation.
     
    Last edited: Dec 4, 2018
  2. deirdre

    deirdre Moderator Staff Member

    why do you have 34 pink points on your chart? and since you still have 29 days in your left hand side, what kind of 'cleansing' did you do? also where did you get your initial data from?
     
  3. Rory

    Rory Senior Member

    Good questions. :)

    1. Because the lunar cycle is a cycle, the chart is slightly repeated, to better represent it (I should probably mention that)
    2. What's been removed are Tuesdays, weekends, and holidays. The way this works is: let's say there are 7670 data points - i.e., individual day records - then once the aforementioned are removed, we're left with about 4000. This equals to about 140 daily records for each day of the lunar cycle. It's not days of the lunar cycle that are removed, but rather occasional daily records. This removal doesn't affect the final results, in the sense of altering the relationship of one day of the cycle to another, given that they're averages. And, as I mentioned above, the same general result is shown whether these are included or not: there are just so many records and data points that each day of the lunar cycle will fall on more or less the same number of weekends, holidays, and Tuesdays.
    3. There's a link above. That says it's from the Centers for Disease Control and Prevention's National Center for Health Statistics.

    I'll attach my spreadsheet if anyone wants a peruse. Probably a bit higgledy-piggledy in places, but it should make sense.
     

    Attached Files:

  4. deirdre

    deirdre Moderator Staff Member

    oh i see. you just mirrored it for some reason.
    upload_2018-12-4_16-10-58.

    then why are there 29 days on the left hand chart?

    ok. you went through both those pages and manual added all the days for each day of every year then averaged them? while simultaneously correlating which is a weekend, sunday and tuesday?
     
  5. deirdre

    deirdre Moderator Staff Member

    what? its 21 years of data. so there shoul dbe 21 daily records for each day of the lunar cycle.
     
  6. Rory

    Rory Senior Member

    I might be misunderstanding your question, but are you thinking that because there are no Tuesdays in the data - forgetting holidays and weekends for the time being - that there should be one (or more) less days in the left-hand column?

    If so, the answer is because over the course of 21 years there will be around 260 lunar cycles. Each day of the lunar cycle will occur 260 times and 'collect' 260 points of data. The spread of the week is even. Remove Tuesdays and you remove 260/7 data points. That leaves 223 data points. When holidays and weekends are similarly removed that leaves around 140 data points.

    Even with so many removed, this still generates around 1.83 million live births for each day of the lunar cycle.

    Here's a chart to show the difference removing each 'significant variable' makes:

    upload_2018-12-4_22-40-51.

    The reason it doesn't change the overall picture is because all the variables - weekends, holidays, and Tuesdays - will be averaged out over the lunar cycle, so that each day receives its 'fair share'. On the micro level, the picture may change a little, but overall each day remains within touching distance of the average - that is, whichever way we look at it, there are no significant peaks, troughs, or variations.

    Hope that helps to clarify. :)
     
    Last edited: Dec 4, 2018
  7. deirdre

    deirdre Moderator Staff Member

    sorry. i wasnt thinking those were days of the lunar cycle (even though i see now it says 'cycle). i forgot the moon changes calender days. airhead moment.
     
    • Like Like x 1
  8. deirdre

    deirdre Moderator Staff Member

    that's what i was thinking. why remove or 'normalize' them at all.
     
  9. Rory

    Rory Senior Member

    It makes sense if you only have a small data set, as some entries may correspond with a higher number of 'outliers' than others, and that would throw it off. But on something this massive it doesn't make any significant difference.
     
    • Like Like x 1
  10. Rory

    Rory Senior Member

    Been looking at this again, just to make sure I'd done everything right, and I can see a few places for improvement.

    Number one, I've figured a much better way of calculating the lunar cycle, including one way which factors in for 'Day 29', which only 'appears' half the time.

    Mainly, though, it's with regard to the 'normalization' - I was totally wrong about that, and it doesn't 'average out', even over 21 years and 260 lunar cycles. Some 'lunar days', then, fall much more often, or more seldomly, on low birthrate days such as holidays, weekends, Mondays, etc, and it makes a massive difference: Day 21 after a full moon, for example, returns 173,000 more births than Day 22 when Mondays are excluded.

    It was quite striking till I tallied up and found that Day 22 fell on 4 less Mondays than Day 21, while Day 21 landed on four Tuesdays, the most popular day.

    The spreads are as follows:
    • Mondays and Tuesdays - minimum 34, maximum 39
    • Weekend days - 71 to 76
    • 13th of the month - 4 to 12
    • Holidays - 3 to 13
    • Non-weekend holidays - 1 to 9
    • Valentines Day - 0 to 2 (higher birthrate)
    Caton and Wheatley also suggested factoring in for seasonal variations. Broadly speaking, there are two 'seasons' for birth rates:
    • June-October - with a spread of 108 to 110
    • November-May - 149 to 152
    Seasonal variations seem to be less of a factor than holidays, etc. Tuesdays, also, though the most popular day, are not so much at variance as Mondays - Tuesdays are +2% on Tue-Fri, while Mondays are 8% down. Probably due to more holidays falling on a Monday, as well as 'long weekends'.

    'Normalization', then, looks rather tricky. But probably I'll have a go at a big long equation to give it the best shot. And still prove the same thing as was shown in the beginning. ;)

    (Spreadsheet attached)
     

    Attached Files:

  11. Rory

    Rory Senior Member

    It sure makes me think how important it is to go through stats with a fine toothcomb. Sneaky anomalies are waiting around every corner. :)
     
  12. Rory

    Rory Senior Member

    Strangely, two studies on a possible correlation between the lunar cycle and birth rates were published within a few months of each other in 2016 - only this time involving cows.

    One study was carried out by Professor Tomohiro Yonezawa of the University of Tokyo, and quite widely reported on, at places like Live Science, Agriland (Ireland), ABC Australia, Asian Scientist, and, of course, The Daily Mail.
    Good university. Smart professor. Very low probability of chance. Looks convincing, right?

    Except...they only studied birth records for 428 deliveries.

    I personally find that kind of shocking, that a presumably reputable establishment such as the University of Tokyo would not only put that out, but think there might be something in it in the first place.

    The other study, meanwhile, published a few months earlier, looked at records involving over two million births and found no correlation with any particular day or phase of the lunar cycle:
    This study, however, seems to have received little or no press.

    Correction: that should read "Day 21 landed on four more Tuesdays."
     
    Last edited: Dec 8, 2018
  13. Rory

    Rory Senior Member

    I'm having a go at the normalization - and it's definitely more intricate than would first appear, what with birth rates changing depending on which day of the week holidays fall on, days surrounding holidays receiving a boost, and even whether particular days happen to randomly coincide more often with fallow or productive years.

    Checking for anomalies revealed one interesting non-holiday that needs normalizing - post-2001 September 11th:

    upload_2018-12-9_22-4-23.

    This chart shows the difference in average daily birthrate for each of the above dates compared to the September weekday average. Before 9/11, the 11th was an almost exactly average day - now, it's about 6% down, and even less popular than Friday the 13th.

    I'm not sure if this has been confirmed before, but a 2013 analysis and anecdotal evidence suggested as much.
     
  14. Rory

    Rory Senior Member

    In finishing off this study I continue to be amazed by the way some people - academics, university professors included - handle statistics.

    Needless to say, I don't know all the Fancy Dan stuff they do - about chi-squares and p numbers, etc - but I do know you need more than 400 people to figure out what the most popular day of the month to be born is (for example).

    To this end, I managed to find some information about the study by Guillon quoted by Caton in the OP - the "next largest study [of 5.9 million births]" - which claimed to show that birth rates were increased during the last quarter to new moon. The information comes from a secondary source, but, assuming that it's been reproduced accurately, is as follows:
    It's quite extraordinary that such a minute, almost non-existent 'difference' - a completely expected one - could be seen by anyone as "statistically significant" - especially given the arbitrary parameters they use. Unless, of course, they had a predetermined agenda.

    I also wrote to the author of the Japanese study on cows, suggesting that 428 wasn't a large enough sample size. He acknowledged this, but also stated that "the data is the data" and suggested he was working on a hypothesis as to why bovine births might be affected by the moon.

    When I pointed him to a study analysing 2 million bovine births which showed zero correlation, he sent me a human study undertaken by Masahiko Fujiwara from the University of Ochanomizu, which also claimed to show a correlation between increased birth rates and the lunar cycle (this time, novelly, -2 and +4 days from both the full and new moons).

    They analysed the probability of their findings occurring by chance, and returned results of p<0.0069 and 0.4%. This was seen as fairly conclusive.

    Problem number one, however, was that their sample size was only 2531 subjects spread across seven years - roughly one birth per day, or 86 births per day of the lunar cycle.

    Problem number two, their probability analysis seemed to be to determine the chances of obtaining their specific set of random results.

    This seems akin to dropping seven letters out of a Scrabble bag; reading the 'word' "KWIJIBO"; calculating the chances (slim); and concluding that you'd just witnessed a miracle.

    And then repeating it, this time getting "AAEIKLU", and concluding the same, since it's such a low probability.

    The key in all these studies is sample size and timescale. Even with 85 million births spread over 21 years, the data has to be looked over extremely carefully to ensure accurate representation of reality. As noted above, it doesn't 'even out', even with such large numbers. Random variations should always be expected. I think we ought to be looking for quite a bit more than a 0.07% deviation before we start declaring findings as 'statistically significant'.
     
    Last edited: Dec 14, 2018
  15. Mechanik

    Mechanik Member

    Thanks for this @Rory. It seems so unlikely that births correlate with lunar cycles so it’s great to see someone put some effort into a proof.

    It would be interesting to find a dataset that reports scheduled (planned Caesarian and induced) births and subtract that from the raw birth data. That would leave only the “natural” (scheduled by nature instead of people) births which should be somewhat immune to holidays and weekends. I found statistics on those numbers, but not actual daily data. I’m not sure if that data even exists but a search of the ICD-9/10 codes should tell me that.

    Since you dropped holidays and weekends and it had no real effect on the results suggests that the above may make no difference, either.
     
    • Like Like x 1
  16. Mechanik

    Mechanik Member

    So i looked up the ICD codes for Caesarian births and it would be difficult, but challenging, to work up births by this method. However, that assumes that one can find the data that contains that information at the daily level. While I found aggregate data in a variety of studies (annual numbers only), the ICD-level data that matches @Rory’s data, and that lists the diagnostic code for every birth, may not be publicly available. The ICD9 codes are available here, if anyone is interested. http://www.icd9data.com/2012/Volume1/630-679/660-669/669/669.7.htm.

    I’m going to do some more searches for the data, but I’m not convinced I’ll find it, or that it would make a difference in the random birth distribution.
     
    • Informative Informative x 1
  17. Rory

    Rory Senior Member

    I shall check that out. On a quick note, I have looked at birth methods (2007-2017) split by day of the week and it comes out (iirc) at about 36% c-sections on a weekday, and 24% on a weekend.

    I also looked at gestation ages and there's an incredible range either side of 39 weeks.

    It's all so variable I'm not sure it matters too much to go that far into it. Though ideally I suppose non-induced natural births would be the best. But then if it's always a standard rate on top of that, it shouldn't have any bearing on the results.

    I got my recent data from the Wonder portal at the CDC website; it's very useful. Then there's the main CDC database that has truly enormous zip files in .pub format; I'm guessing that's where my data set comes from, via fiveeighty.com, though I haven't looked at those original files.

    Currently working up my findings into a proper(ish) paper - and going way overboard on it, of course. But why not? Do it right once maybe no one'll ever have to do it again. ;)

    Let me know if you want any stats and I'll see what I've got mañana. :)
     
    • Useful Useful x 1
  18. Rory

    Rory Senior Member

    *fivethirtyeight.com - was on t'phone last night and writing from memory.
    Does this help?

    upload_2018-12-16_9-34-27.

    That's vaginal vs caesarean for each day of the week by gestational age. It shows the marked difference in caesareans between weekends and weekdays - though it's not as big a difference as I might have expected, given the drop in overall figures.

    I'm not quite sure how best to present this, as far as a chart goes, but I think there might be some interesting points to pull out. For example, at first glance I notice that Monday is the highest day for 39-week c-sections, even though it's the lowest weekday overall (and lowest for 38- and 40- weeks). And Friday is the highest day for all pre-39 week c-sections, but generally the second lowest weekday for 39 weeks and longer.

    Spreadsheet attached with the raw data included if you want to go at it (obtained from the CDC database).
     

    Attached Files:

  19. Rory

    Rory Senior Member

    I wonder if this is more representative (data from 2007-2017)?

    upload_2018-12-16_14-24-26.

    I can't remove holidays for the data showing c-sections, or calculate by a daily average, but I can for the 'all births' figure:

    upload_2018-12-16_15-42-32.

    I guess from that I can arrive at a rough daily ratio for a non-holiday, non-caesarean, 38-40 week pregnancy.
     
    Last edited: Dec 16, 2018
    • Like Like x 1
  20. Mechanik

    Mechanik Member

    I downloaded your spreadsheet(s) and just started looking at ways to visualize your summary table, with an emphasis on getting a feel for the data. Here's my first thoughts. This graph is your summary of all births by day of week:
    upload_2018-12-16_7-43-46.
    It's a little cluttered, but there is a clear pattern in number of births by day.
    After I did that, I went back an reviewed the ICD9 instructions and the rule for the ICD9 code for Caesarian is that "planned" Caesarians only occur during 38th week. All the others are coded differently with what amounts to some kind of medical necessity.

    Here is the chart with all vaginal births removed and only week 38 showing.
    upload_2018-12-16_7-52-5.
    In theory, these are the planned Caesarians and can be subtracted from the daily totals. This assumes the source data counts only "planned" (IDC9-649.8) births on week 38, with all other weeks being medical necessities. This is probably a bad assumption.

    If you do that, you get this:
    upload_2018-12-16_7-57-18.
    The entire weekly curve is flattened and the peaks, Tuesday and Friday, move around a bit.

    Just before posting, I had another thought, so here is the graph of ONLY vaginal births:
    upload_2018-12-16_8-15-33.
    This one is flatter still (flatter curve, I suppose). I'm a little surprised that this is skewed so much into the work week. With two children of my own, day-of-week didn't seem like an option at the time. I wonder if there are other codes in the CDC data that reflect planned vaginal births....

    So, on to the raw data.
     
    • Like Like x 1
  21. Mechanik

    Mechanik Member

    After looking at the data, and picking up an additional data source from the CDC, here are some additional thoughts.

    When looking for more detailed data, I found an article that you may find interesting. It has detailed graphics on births, by minute, hour, and day of week. Its short on details, but also mentions that that 50% of all births are induced. Assuming that's correct, then half the data is not part of a natural pattern. We have to figure out which half.

    I found and downloaded the 2017 Natality Public Use File from the CDC Vital Statistics Online Data Portal, along with the user guide and it appears to have what I'm looking for, including type of birth, vaginal attempt before cesarean, and various induction methods. The data is fairly large (over 5 GB for 2017 alone), and requires extensive reformatting before use. That's probably going to take a few hours so I'm going to have to look at converting it another day.

    If I were to make a prediction though, halving the number of data points by removing any form of induced birth is going to do nothing but flatten out the daily variances until they look like your initial correlation graphs. i.e. more evidence that this is debunked.
     
    • Informative Informative x 1
  22. Rory

    Rory Senior Member

    Sounds like excellent work! :)
     
  23. Mechanik

    Mechanik Member

    Looks like a few folks are following this so I thought I’d post an update.

    First, let me say that I now understand why studies like the 538 and others are conducted by organizations with money. Converting this data to something useful has been challenging. I need a bunch of grad students or interns....

    I’ve managed to extract about a million records into Excel, which should give us three months of 2017 birth records. There are actually 1.4m records but that load cuts off part of April so my next task is to pare that back to 3 months, deconstruct the relevant flags (non-induced, natural), and produce data aggregations that can be merged with @Rory’s data to see if it confirms or refutes the hypothesis.

    As a side project, I’m trying to use some industrial-strength data management software to deconstruct the whole data set but I had some compatibility issues between the various components and had to uninstall/reinstall current versions after crashing my (work) server earlier this week. While I manage data scientists, it’s obvious to me that there are valid reasons why I manage them instead of contributing to their efforts. :)

    I’m working this in parallel with Excel, but the Excel proof will be first.
     
    • Like Like x 1
  24. Mechanik

    Mechanik Member

    This is becoming personal.... TLDR is that I can't use the 1.4 million Excel records because the month data is not complete. :(

    Turns out that the source data is not sorted by month. The Excel conversion grabbed the first 1.4 million records. When I looked at the first thousand or so records, they were all January. When I looked at the last thousand, they were April. I mistakenly thought they were in date order. When I sorted them by month, however, I got records for all 12 months! At that point, about 25% of the total records are in Excel, which is roughly 25% of the total 3,864,754 births in the continental US (plus 29,851 in the territories), the data seemed OK, but with all 12 months represented it is safe to assume that we do not have all the records for even a single month in Excel. I can upload this sheet (or maybe not, as it's 403MB) if anyone is interested in a failed Excel experiment.

    So, back to square one. I was able to read the data into an ETL (extract-transform-load) tool, but there are challenges in deconstructing the huge records into usable data elements. In other words, I can't generate a graph until I can figure out how to break out the data fields from the 1300-byte records. The format the CDC used appears to be an old mainframe COBOL fixed record data format and they did not include the header (FD) record which would allow the read to be automated. I have at least three ideas to try over the holidays, so I haven't given up yet.

    Edit: The "TL":"DR" in the first line gets turned into a smiley!
     
  25. Mechanik

    Mechanik Member

    Good news everyone! (insert Farnsworth image here). I have managed to waterboard the the raw data into confessing its' secrets. I was so excited with the initial data reads that I had to post something. The first is 2017 total births by month in the US, and I believe that includes US territories.
    upload_2018-12-28_21-7-21.
    Looks like folks get friendly in December but not so much in June.

    This one is more interesting. It totals and breaks out births by "final" method, by day of week.

    upload_2018-12-28_21-10-2.

    The legend should read:
    0=missing (zero in raw data)
    1 = Spontaneous (natural)
    2 = Forceps (assisted)
    3 = Vacuum (assisted)
    4 = Cesarean (red bar)
    9 = Unknown or not stated (not visible, tiny percentage)

    In line with expectations, if you eliminate the cesareans(red) the graph is more level across days. Much more work to do here to go back and address the initial hypothesis. There is a lot of data about incoming status and pre-conditions that may be relevant. I'd also like to try to match this to Rory's data so that he can apply the lunar cycles to the outcomes.

    I think I have the data portion of this under control and can generally produce whatever data anyone wants on this topic.
     
  26. Mechanik

    Mechanik Member

    This is probably my final post on this topic. First, the good news. The following chart shows all births in the US, 2017, by day of week. Induced births are green and non-induced (natural) are blue.
    upload_2019-1-2_17-33-10.


    This covers a total of 2,303,235 natural births, which are surprisingly flat across the M-F timeframe, suggesting an consistent random distribution across the day of the week. The induced births total 872,879 across the 7 days of the week. There are also about 350 births on W-F that are marked unknown.

    The data are very consistent across the workweek (M-F). The surprise to me is that the weekends are so different. If the blue is really "natural", then how can this be? While I can't say for certain, there are choices that one can make with regards to labor that are not random. Many have been mentioned earlier, so I'll just add that at 9+ months, my wife decided she was tired of pregnancy and, having heard that jogging can induce labor, decided to chase our loose dog at a run one afternoon. She went into labor several hours later. Millions of choices like this and the ones mentioned above can skew the results away from the weekend, but I'm still surprised at the magnitude.

    The births in the graph represent roughly 3.2m of the reported 3.864m births that year. Why are we missing 664,000 births? It appears that the discrepancy comes from the roughly 1.2m cesarean births. Roughly half of these were induced naturally, but ultimately resulted in cesarean. In other words, attempted natural, labor induced, but cesarean was the final delivery method, which is what is illustrated in the graph.

    The bad news is that the source data does not contain complete birth dates. Day of month is missing. I have no idea where 538 found the birth dates for the original source of @Rory data, but this data source is not it. This is disappointing to me as I could not separate out natural births by day-of-month so as to correlate to lunar cycles, as that was the original intent of this exercise.

    On the other hand, my hypothesis that separating out natural from induced births would create a nearly flat day-of-week birth distribution is proved, and so I'll take comfort in that. I also have a strong data set which can be used for a lot of birth statistics and I'm willing to provide extracts or reports, if anyone is interested. If you want this, please make sure it's on topic for this thread, otherwise, just PM me.
     
    • Like Like x 1
  27. Rory

    Rory Senior Member

    This is really beautiful work - I'm well impressed, and will no doubt find it super useful when I get back to my own project on this. Bit busy with other things at the mo but I'm planning to get stuck in again once that passes.

    Thanks so much for taking up the baton with such aplomb! :)
     
    • Like Like x 1