submitted by /u/Saltedline to r/space [link] [comments] |

Article URL: https://www.ca5.uscourts.gov/opinions/pub/20/20-61007-CV0.pdf

Comments URL: https://news.ycombinator.com/item?id=31429091

Points: 81

# Comments: 136

Article URL: https://nhigham.com/2022/05/18/the-big-six-matrix-factorizations/

Comments URL: https://news.ycombinator.com/item?id=31421745

Points: 253

# Comments: 73

This article is an excerpt from my book Probably Overthinking It, to be published by the University of Chicago Press in early 2023. This book is intended for a general audience, so I explain some things that might be familiar to readers of this blog – and I leave out the Python code. After the book is published, I will post the Jupyter notebooks with all of the details!

If you would like to receive infrequent notifications about the book (and possibly a discount), please sign up for this mailing list.

Suppose you are the ruler of a small country where the population is growing quickly. Your advisers warn you that unless growth slows down, the population will exceed the capacity of the farms and the peasants will starve.

The Royal Demographer informs you that the average family size is currently 3; that is, each woman in the kingdom bears three children, on average. He explains that the replacement level is close to 2, so if family sizes decrease by one, the population will stabilize at a sustainable size.

One of your advisors asks: “What if we make a new law that says every woman has to have fewer children than her mother?”

It sounds promising. As a benevolent despot, you are reluctant to restrict your subjects’ reproductive freedom, but it seems like such a policy could be effective at reducing family size with minimal imposition.

“Make it so,” you say.

Twenty-five years later, you summon the Royal Demographer to find out how things are going.

“Your Highness,” they say, “I have good news and bad news. The good news is that adherence to the new law has been perfect. Since it was put into effect, every woman in the kingdom has had fewer children than her mother.”

“That’s amazing,” you say. “What’s the bad news?”

“The bad news is that the average family size has increased from 3.0 to 3.3, so the population is growing faster than before, and we are running out of food.”

“How is that possible?” you ask. “If every woman has fewer children than her mother, family sizes have to get smaller, and population growth has to slow down.”

Actually, that’s not true.

In 1976, Samuel Preston, a demographer at the University of Washington, published a description of this apparent paradox: “A major intergenerational change at the individual level is required in order to maintain intergenerational stability at the aggregate level.”

To make sense of this, I’ll use the distribution of fertility in the United States (rather than your imaginary island). Every other year, the Census Bureau surveys a representative sample women in the United States and asks, among other things, how many children they have ever born. To measure completed family sizes, they select women aged 40-44 (of course, some women have children in their forties, so these estimates might be a little low).

I used the data from 1979 to estimate the distribution of family sizes at the time of Preston’s paper. Here’s what it looks like.

The average total fertility was close to 3. Starting from this distribution, what would happen if every woman had the same number of children as her mother? A woman with 1 child would have only one grandchild; a woman with 2 children would have 4 grandchildren; a woman with 3 children would have 9 grandchildren, and so on. In the next generation, there would be more big families and fewer small families.

Here’s what the distribution would look like in this “Same as mother” scenario.

The net effect is to shift the distribution to the right. If this continues, family sizes would increase quickly and population growth would explode.

So what happens in the “One child fewer” scenario? Here’s what the distribution looks like:

Bigger families are still overrepresented, but not as much as in the “Same as mother” scenario. The net effect is an *increase* is average fertility from 3 to 3.3.

As Preston explained, “under current [1976] patterns a woman would have to bear an average of almost *two* children fewer than … her mother merely to keep population fertility rates constant from generation to generation”. One child fewer was not enough!

As it turned out, the next generation in the U.S. had 2.3 fewer children than their mothers, on average, which caused a steep decline in average fertility:

Average fertility in the U.S. has been close to 2 since about 1990, although it might have increased in the last few years (keeping in mind that the women interviewed in 2018 had most of their children 10-20 years earlier).

Preston concludes, “Those who exhibit the most traditional behavior with respect to marriage and women’s roles will always be overrepresented as parents of the next generation, and a perpetual disaffiliation from their model by offspring is required in order to avert an increase in traditionalism for the population as a whole.”

So, if you have fewer children than your parents, don’t let anyone say you are selfish; you are doing your part to avert population explosion!

*My thanks to Prof. Preston for comments on a previous version of this article.*

If you would like to get infrequent email announcements about my book, please sign up below. I’ll let you know about milestones, promotions, and other news, but not more than one email per month. I will not share your email or use this list for any other purpose.

This is a draft of a chapter from my in-progress book, *Practical Math for Programmers: A Tour of Mathematics in Production Software*.

**Tip: **Determine an aggregate statistic about a sensitive question, when survey respondents do not trust that their responses will be kept secret.

**Solution:**

```
import random
def respond_privately(true_answer: bool) -> bool:
'''Respond to a survey with plausible deniability about your answer.'''
be_honest = random.random() < 0.5
random_answer = random.random() < 0.5
return true_answer if be_honest else random_answer
def aggregate_responses(responses: List[bool]) -> Tuple[float, float]:
'''Return the estimated fraction of survey respondents that have a truthful
Yes answer to the survey question.
'''
yes_response_count = sum(responses)
n = len(responses)
mean = 2 * yes_response_count / n - 0.5
# Use n-1 when estimating variance, as per Bessel's correction.
variance = 3 / (4 * (n - 1))
return (mean, variance)
```

In the late 1960’s, most abortions were illegal in the United States. Daniel G. Horvitz, a statistician at The Research Triangle Institute in North Carolina and a leader in survey design for social sciences, was tasked with estimating how many women in North Carolina were receiving illegal abortions. The goal was to inform state and federal policymakers about the statistics around abortions, many of which were unreported, even when done legally.

The obstacles were obvious. As Horvitz put it, “a prudent woman would not divulge to a stranger the fact that she was party to a crime for which she could be prosecuted.” [Abernathy70] This resulted in a strong bias in survey responses. Similar issues had plagued surveys of illegal activity of all kinds, including drug abuse and violent crime. Lack of awareness into basic statistics about illegal behavior led to a variety of misconceptions, such as that abortions were not frequently sought out.

Horvitz worked with biostatisticians James Abernathy and Bernard Greenberg to test out a new method to overcome this obstacle, without violating the respondent’s privacy or ability to plausibly deny illegal behavior. The method, called *randomized response*, was invented by Stanley Warner in 1965, just a few years earlier. [Warner65] Warner’s method was a bit different from what we present in this Tip, but both Warner’s method and the code sample above use the same strategy of adding randomization to the survey.

The mechanism, as presented in the code above, requires respondents to start by flipping a coin. If heads, they answer the sensitive question truthfully. If tails, they flip a second coin to determine how to answer the question—heads resulting in a “yes” answer, tails in a “no” answer. Naturally, the coin flips are private and controlled by the respondent. And so if a respondent answers “Yes” to the question, they may plausibly claim the “Yes” was determined by the coin, preserving their privacy. The figure below describes this process as a diagram.

Another way to describe the outcome is to say that each respondent’s answer is a single bit of information that is flipped with probability 1/4. This is half way between two extremes on the privacy/accuracy tradeoff curve. The first extreme is a “perfectly honest” response, where the bit is never flipped and all information is preserved. The second extreme has the bit flipped with probability 1/2, which is equivalent to ignoring the question and choosing your answer completely at random, losing all information in the aggregate responses. In this perspective, the aggregate survey responses can be thought of as a digital signal, and the privacy mechanism adds noise to that signal.

It remains to determine how to recover the aggregate signal from these noisy responses. In other words, the surveyor cannot know any individual’s true answer, but they can, with some extra work, estimate statistics about the underlying population by correcting for the statistical bias. This is possible because the randomization is well understood. The expected fraction of “Yes” answers can be written as a function of the true fraction of “Yes” answers, and hence the true fraction can be solved for. In this case, where the random coin is fair, that formula is as follows (where stands for “the probability of”).

And so we solve for

We can replace the true probability above with our fraction of “Yes” responses from the survey, and the result is an estimate of . This estimate is unbiased, but has additional variance—beyond the usual variance caused by picking a finite random sample from the population of interest—introduced by the randomization mechanism.

With a bit of effort, one can calculate that the variance of the estimate is

And via Chebyshev’s inequality, which bounds the likelihood that an estimator is far away from its expectation, we can craft a confidence interval and determine the needed sample sizes. Specifically, the estimate has additive error at most with probability at most . This implies that for a confidence of , one requires at least samples. For example, to achieve error 0.01 with 90 percent confidence (), one requires 7,500 responses.

Horvitz’s randomization mechanism didn’t use coin flips. Instead they used an opaque box with red or blue colored balls which the respondent, who was in the same room as the surveyor, would shake and privately reveal a random color through a small window facing away from the surveyor. The statistical principle is the same. Horvitz and his associates surveyed the women about their opinions of the privacy protections of this mechanism. When asked whether their friends would answer a direct question about abortion honestly, over 80% either believed their friends would lie, or were unsure. *[footnote: A common trick in survey methodology when asking someone if they would be dishonest is to instead ask if their friends would be dishonest. This tends to elicit more honesty, because people are less likely to uphold a false perception of the moral integrity of others, and people also don’t realize that their opinion of their friends correlates with their own personal behavior and attitudes. In other words, liars don’t admit to lying, but they think lying is much more common than it really is.]* But 60% were convinced there was no trick involved in the randomization, while 20% were unsure and 20% thought there was a trick. This suggests many people were convinced that Horvitz’s randomization mechanism provided the needed safety guarantees to answer honestly.

Horvitz’s survey was a resounding success, both for randomized response as a method and for measuring abortion prevalence. [Abernathy70] They estimated the abortion rate at about 22 per 100 conceptions, with a distinct racial bias—minorities were twice as likely as whites to receive an abortion. Comparing their findings to a prior nationwide study from 1955—the so-called Arden House estimate—which gave a range of between 200,000 and 1.2 million abortions per year, Horvitz’s team estimated more precisely that there were 699,000 abortions in 1955 in the United States, with a reported standard deviation of about 6,000, less than one percent. For 1967, the year of their study, they estimated 829,000.

Their estimate was referenced widely in the flurry of abortion law and court cases that followed due to a surging public interest in the topic. For example, it is cited in the 1970 California Supreme Court opinion for the case *Ballard v. Anderson*, which concerned whether a minor needs parental consent to receive an otherwise legal abortion. [Ballard71, Roemer71] It was also cited in *amici curiae* briefs submitted to the United States Supreme Court in 1971 for *Roe v. Wade*, the famous case that invalidated most U.S. laws making abortion illegal. One such brief was filed jointly by the country’s leading women’s rights organizations like the National Organization for Women. Citing Horvitz for this paragraph, it wrote, [Womens71]

While the realities of law enforcement, social and public health problems posed by abortion laws have been openly discussed […] only within a period of not more than the last ten years, one fact appears undeniable, although unverifiable statistically. There are at least one million illegal abortions in the United States each year. Indeed, studies indicate that, if the local law still has qualifying requirements, the relaxation in the law has not diminished to any substantial extent the numbers in which women procure illegal abortions.

It’s unclear how the authors got this one million number (Horvitz’s estimate was 20% less for 1967), nor what they meant by “unverifiable statistically.” It may have been a misinterpretation of the randomized response technique. In any event, randomized response played a crucial role in providing a foundation for political debate.

Despite Horvitz’s success, and decades of additional research on crime, drug use, and other sensitive topics, randomized response mechanisms have been applied poorly. In some cases, the desired randomization is inextricably complex, such as when requiring a continuous random number. In these cases, a manual randomization mechanism is too complex for a respondent to use accurately. Trying to use software-assisted devices can help, but can also produce mistrust in the interviewee. See [Rueda16] for additional discussion of these pitfalls and what software packages exist for assisting in using randomized response. See [Fox16] for an analysis of the statistical differences between the variety of methods used between 1970 and 2010.

In other contexts, analogues to randomized response may not elicit the intended effect. In the 1950’s, Utah used death by firing squad as capital punishment. To avoid a guilty conscience of the shooters, one of five marksmen was randomly given a blank, providing him some plausible deniability that he knew he had delivered the killing shot. However, this approach failed on two counts. First, once a shot was fired the marksman could tell whether the bullet was real based on the recoil. Second, a 20% chance of a blank was not enough to dissuade a guilty marksman from purposely missing. In the 1951 execution of Elisio Mares, all four real bullets missed the condemned man’s heart, hitting his chest, stomach, and hip. He died, but it was neither painless nor instant.

Of many lessons one might draw from the botched execution, one is that randomization mechanisms must take into account both the psychology of the participants as well as the severity of a failed outcome.

```
@book{Fox16,
title = {{Randomized Response and Related Methods: Surveying Sensitive Data}},
author = {James Alan Fox},
edition = {2nd},
year = {2016},
doi = {10.4135/9781506300122},
}
@article{Abernathy70,
author = {Abernathy, James R. and Greenberg, Bernard G. and Horvitz, Daniel G.
},
title = {{Estimates of induced abortion in urban North Carolina}},
journal = {Demography},
volume = {7},
number = {1},
pages = {19-29},
year = {1970},
month = {02},
issn = {0070-3370},
doi = {10.2307/2060019},
url = {https://doi.org/10.2307/2060019},
}
@article{Warner65,
author = {Stanley L. Warner},
journal = {Journal of the American Statistical Association},
number = {309},
pages = {63--69},
publisher = {{American Statistical Association, Taylor \& Francis, Ltd.}},
title = {Randomized Response: A Survey Technique for Eliminating Evasive
Answer Bias},
volume = {60},
year = {1965},
}
@article{Ballard71,
title = {{Ballard v. Anderson}},
journal = {California Supreme Court L.A. 29834},
year = {1971},
url = {https://caselaw.findlaw.com/ca-supreme-court/1826726.html},
}
@misc{Womens71,
title = {{Motion for Leave to File Brief Amici Curiae on Behalf of Women’s
Organizations and Named Women in Support of Appellants in Each Case,
and Brief Amici Curiae.}},
booktitle = {{Appellate Briefs for the case of Roe v. Wade}},
number = {WL 128048},
year = {1971},
publisher = {Supreme Court of the United States},
}
@article{Roemer71,
author = {R. Roemer},
journal = {Am J Public Health},
pages = {500--509},
title = {Abortion law reform and repeal: legislative and judicial developments
},
volume = {61},
number = {3},
year = {1971},
}
@incollection{Rueda16,
title = {Chapter 10 - Software for Randomized Response Techniques},
editor = {Arijit Chaudhuri and Tasos C. Christofides and C.R. Rao},
series = {Handbook of Statistics},
publisher = {Elsevier},
volume = {34},
pages = {155-167},
year = {2016},
booktitle = {Data Gathering, Analysis and Protection of Privacy Through
Randomized Response Techniques: Qualitative and Quantitative Human
Traits},
doi = {https://doi.org/10.1016/bs.host.2016.01.009},
author = {M. Rueda and B. Cobo and A. Arcos and R. Arnab},
}
```

Next Page of Stories