2012 tax rates: Understanding Life Expectancy

To contact us Click HERE
Demographics are destiny. As Earth reaches the 7 billion human population milestone it is more important than ever that we understand how to model the future: how fast is our population growing? What, if any, resource constraints will we face in the future as world populations continue to grow? How will the shifting balance between young and old people play out? What will be the economic, social, political and military consequences of shifting demographic patterns?

Predicting the future is hard. Nevertheless, in studying demographic trends, we have one big advantage over most social sciences: what physicists would call a conservation law. Put simply: if you are age 40 today, then in ten years, either you will be age 50, or you will be dead.

We are going to combine this fact with real data about the US from the Centers for Disease Control and Prevention (CDC) to make some graphs showing life expectancy as a function of age. Since some people will live more or fewer years than expected, we will also look at the distribution of mortality outcomes. Finally, we will also look at how life expectancy could change if medical technology improves mortality rates by 1% each year.

If we try to model the impact of advertising on shoppers, or thefuture direction of the stock market, or who will win the nextelection, we face the difficulty of predicting human behavior in theface of all our irrationality and eccentricity. But when it comes todemographics, we have an inescapable mathematical law:

population in future equals   population now plus births minus deaths

At least, that works for the total population of the Earth. If we wantto track population by country, we have to add a net migration term aswell, since not everyone stays permanently in the country of theirbirth. But we won't bother about migration today.

This fundamental law means that we know the correct mathematicalequations for modeling demographics; all that is needed is to measurethe parameters (rates of fertility and mortality). Of course, we stilldon't know the future: will people choose to have more children, orfewer, as time goes on? Will a cure for cancer result in people livinglonger, or will an epidemic kill a lot of people? But at least we canhave an informed discussion around the probability and magnitude ofthese kinds of scenarios, since the underlying math gives us theframework within which to work.

To do the calculations, we have to be a little careful aboutdefinitions. If you want, you can skip forward to the graphs, but theywill be easier to interpret if you read throughthe next few paragraphs.

We are interested in predicting the future, so we need todefine metrics that are likely to be relatively constant overtime. Suppose we track the total number of deaths in the US eachyear. This number will probably grow from year to year, even thoughpeople are living longer. Why? Simple: the US population is alsogrowing. Suppose a constant fraction of the population dies each year(e.g. 1%). As the total population P(t) grows with time, so does thetotal number of deaths D(t) = 0.01*P(t). So the ratio D(t)/P(t) is abetter metric than plain D(t), since it controls for the size of thepopulation.

However, even D/P is not that useful for forecasting,because it is not going to stay constant over time. Humanstypically live for many decades (how many varies by country), becausedeath rates are low for young people (except for infant mortality) andhigh for old people. So, if the mix of old and young changes,D/P will also change.

To illustrate, suppose the death rate is zero for people under80, and is 10% per year for people over 80. Suppose 10% of ourpopulation is over 80. Then D/P will be 0.9*0+0.1*0.1 = 0.01, or onepercent of the total population per year. Now suppose after a fewdecades, the percent of people in the population who are over 80changes to 20%. Then D/P will be 0.8*0+0.2*0.1 = 0.02, or two percentof the population per year.

Since D/P depends on the age distribution, a better model willtrack the D/P ratio separately for each age group; that way we canalso control for shifts in mix. For instance, we will keep track of D,P, and the D/P ratio for 30-year old persons, and separately also for31-year old persons, and so forth.

There is a further complication. The death rate for a given ageactually changes over time, as medical technology improves. A personborn in 1900 was 30 years old in 1930, before antibiotics were common,so their chance of dying during their 30th year was higher than thatof a person born in 1950, whose 30th year was 1980.

Demographers report death rates by age in what arecalled mortality tables. Since death rates vary as technologyimproves, we have to make clear whether the mortality rates we collectin the table refer to one cohort (e.g. the people born in 1900,followed over their lifetimes) or to one period (e.g. thedeaths that occurred during the year 2007, which impacted people froma whole spectrum of birth years).

Constructing a cohort table requires taking data over ahundred year period, and you have to wait until all of the members aredead before you are finished. That makes period tables much morepractical.

Notice, though, that period tables come with the implicitassumption that their rates stay constant from here on out. Thisassumption can be unrealistic. Imagine a baby born in 2007. Theyexperience the infant mortality conditions of the table accuratelyenough, but as they age, they ought to experience less mortality thanthe table suggests. That's because when they are 30, for example, theyear will be 2037, and (hopefully) improved medical care will reducethe death rates for 30 year old people compared with the value in thetable, which reflects the experience of people who were 30 in2007. Similarly, when this 2007 baby is 80, the year will be 2087, not2007, and so the mortality rate ought to reflect another 80 years ofprogress. Thus, the period table shows the life expectancy that thebaby of 2007 would experience if medical care never improvesbeyond the 2007 level.

The bottom line is that the period life table gives us apractical way to compare death rates by age at one moment in time,even though it cannot accurately predict what will happen to yourcohort (let alone you personally), in some distant futureyear. Comparing period tables constructed in different calendar years(or for different countries or races or genders) also gives us astandardized way to compare and track progress.

Demographers measure mortality rates (and also fertilityrates) subdivided by country, age, gender, and sometimes additionalfactors like race. They also have to track immigration, because peoplecan change countries over time. For today though, we are going tolook just at mortality, just in the United States, and withoutsubdividing by race or gender. We will leave fertility and otherelements of a complete model to a future article.

The Centers forDisease Control and Prevention (CDC) publishes "Vital Statistics" eachyear, including aso-called LEWK3 table; we will work with the most recent one, from 2007.This is a period table. From it, we can calculate all sorts ofinteresting metrics, such as life expectancy.

Here are the first few rows in the table - the spreadsheet, which youcan download from the CDC, actually goes all the way to age 100.

Let's be sure we are clear about what each column means. From the CDC documentation file nvsr58_10.pdf page 2:

The row labeled Age 4-5 refers to what happens to people during the year between their 4th and 5th birthdays, i.e. while they are 4 years old.
The probability of dying column (q) is the probability of the person dying during that year of age. Observe that it is much higher (more than 10 times) for age 0-1 than for subsequent years: this is why infant mortality is such a big concern.

The documentation is a bit ambiguous: is q[A] the probability of deathat age A for all people in the original cohort, or just for people whohave attained age A? It must be the latter, since column (q) does notsum to one, as it would need to if these probabilities applied to thewhole cohort. But since all the other columns in the spreadsheet arecalculated from column (q), we can verify this interpretation byreplicating the calculations ourselves.

The number surviving column (l) starts at 100,000 people and goes downover time as the people age. Here is where we blur the distinctionbetween period and cohort. The probabilities (q) are period deathrates, e.g. in 2007 in the US, 0.0460% of children age 1-2 died, and0.0176% of children age 4-5 died. To calculate column (l),we pretend that these rates will not change in thefuture. Hence, our hypothetical cohort of 100,000 people willexperience the infant mortality of the 2007 cohort, the age 1-2 mortalityof the 2006 cohort, the age 2-3 mortality of the 2005 cohort, and soon, since all of those rates were observed in 2007 and hence recordedin our table. As noted above, if medical technology continues toimprove, then for the real 2007 cohort, by the time they reachany given age, their actual mortality should be lower than given inthe table, so more will survive longer than predicted in the table.

But assuming medical care no longer improves, we can followour hypothetical cohort forward and observe what happens. Multiplying100,000 by 0.006761 gives 676 infant deaths, leaving99,324. Multiplying 99,324 by 0.000460 gives 46 deaths during age 1-2,leaving 99,278 survivors. And so forth. If you download thespreadsheet, you can work with exact numbers and avoid round-offerror. Following this pattern you can verify the (l) and(d) numbers from (q).

Since there are one hundred independent numbers in the table(the contents of the (q) column), the best way to understand the tableis via a graph. However, people often prefer having a single numberthat summarizes a lot of the information in the table. There arevarious numbers you could compute for this purpose, such as theaverage death rate, but the one most commonly reported iscalled life expectancy at birth. This is the number 77.9 foundin the age 0-1 row of column (e).

Life expectancy means the expected, or average, number ofadditional years a person in our hypothetical cohort will live. How shall we calculate it? Among people who reached age x, some die inthe next year, during their x-to-(x+1) year. Some die the followingyear. Some keep living a long time further. We have to weight each ofthese outcomes by the corresponding probability, and then add them upto find the expected value.

The two columns labeled (L) and (T) help with this calculation.Except for the first and last rows in the table, the (L) column showsthe average population during that year, assuming people who die do soat a uniform rate throughout the year. For example, the (L) value forage 3-4 is 99,239, which is half way between the surviving population(l) at the start of the year (99,250) and at the end of the year(99,228). The first and last values have been modified based oninformation not shown to us, namely the age in months at which infantsdied, or the age beyond 100 that people survived to. Apparently theyassumed people live exactly two years past 100.At any rate, (L) represents the total number of person-yearslived during this age range: one year for everyone who survives theyear, plus on average half a year for those who die during the year.

The (T) column is calculated from the bottom end of the tablebackward to the top, by summing the (L) column. For instance, thevalue of (T) in the 0-1 age row is the sum of the entire (L) column.Since (L) represents people-years lived during one age row, (T)represents the total people-years lived during and after one age row.Since those total people-years belong to the survivors listed incolumn (l), if we divide (T) by (l) we get the life expectancy beyondthe current age.

Personally, I find it easier to think about this definitionthe other way around: take the probability-weighted sum of the extranumber of years people live. The tricky part is that the probabilitiesare not given by the (q) column, but rather, by the (d) column fromage x onward, divided by the (l) column at age x. To see this, notethat the people in cell l[x] die according to the values in d[x],d[x+1], and so forth, which therefore sum to l[x]. So, for example,starting at age x=10 (meaning row 10-11), if we multiply column (d) by(age-9.5), sum, and divide by l[x], we get 68.6, as expected from thespreadsheet. The spreadsheet formulais =SUMPRODUCT(H15:H105,D15:D105)/C15 after firstputting =0.5+ROW()-15 in column H.A little algebra should convince you the two approaches give the sameanswer.

So far, all we have done is verify the definition of lifeexpectancy. Now for the much more interesting part. Let's draw somegraphs, and then let's ask how these concepts may change as time goesforward. In other words, if medical technology does continue to improve,how much longer might we expect to live? And, rather than just look ataverages, let's look at the distribution of outcomes. There isso much more information here than can be summarized in a singlenumber!

If you like, you can do the calculations and make the graphs yourselfusing a spreadsheet. However, I will show how to do it using the freehigh-quality open-sourcestatistical programming language R. You can follow along bydownloading your own completely free copy of Rfrom The Comprehensive R Archive Network. Look in my earlier post onKoch Snowflakes for some background on why R is a good choice, or look in System Dynamics: Feedback Models for more on population modeling using R, or look in How Do We Know? for an even simpler introduction to R.

The first thing we need to do is use our spreadsheet program toconvert the CDC table, which was designed for human readers, into aplain tab-delimited text file, with numbers in the age column, like this:

Save this in afile called "us-period2007-life-table.tab". Now we can close off thespreadsheet and do the rest with R.

Here's the R code we will use. The first line reads the tableand turns it into a "dataframe", which allows us to refer to the columns by name. The secondline prints a summary of each column so we can check that it read thefile correctly. Next, we define a function for calculating the (l),(d) and (e) columns from the (q) column; this will let us experimentwith modifying the mortality rates. The rest of the code makes thegraphs that we will discuss below.

mort <- read.table('us-period2007-life-table.tab',  header=TRUE, sep='\t')print(summary(mort))N <- length(mort$q)lifeExp <- function(d) {  y <- 1:N  p <- 0  np <- d[N]  for(i in N:1) {    p <- p + d[i]    np <- np + (i-0.5)*d[i]    y[i] <- np/p-i+1  }  y}calc <- function(q) {  l <- 0*q + 100000 # initial cohort size  d <- 0*q  e <- 0*q  for(i in 1:N) {    d[i] <- q[i]*l[i]    l[i+1] <- l[i] - d[i]  }  list(l=l, d=d, e=lifeExp(d))}m <- calc(mort$q)## check we can reproduce 'l', 'd', and 'e' from 'q'if(m$l[N+1] != 0 ||   max(abs(m$l[1:N]-mort$l))>0.1 ||   max(abs(m$d-mort$d))>0.01 ||   max(abs(m$e-mort$e))>0.8)  stop('does not match data')## now make some graphics helper functions:graph <- function(name, ylabel, y, col='black', extra=0) {  png(paste(name,'.png',sep=''), 800, 500)  par(mar=c(5, 5, 1, 1), cex=1.5, lwd=2)  n <- length(y)  yy <- y[1:(n-1)] # drop last point  plot(c(0,n-2), c(0,max(yy)+extra),  type='n', xlab='Age', ylab=ylabel)  add(y, col)}add <- function(y, col='black') {  n <- length(y)  lines(0:(n-2), y[1:(n-1)], col=col)}## now make the plots:graph('le', 'Life Expectancy', lifeExp(mort$d))dev.off()graph('mr', 'Mortality Rate (%/year)',        100*mort$q[1:(N-1)])dev.off()graph('sv', 'Surviors (% of cohort)', mort$l/1e3)dev.off()## modify mortality rates up or down 50%:m <- calc(mort$q * 0.5)m2 <- calc(mort$q * 1.5)graph('le1', 'Life Expectancy', lifeExp(m$d), 'blue')add(lifeExp(mort$d))add(lifeExp(m2$d), 'red')dev.off()graph('sv1', 'Surviors (% of cohort)', m$l/1e3, 'blue')add(mort$l/1e3)add(m2$l/1e3, 'red')dev.off()## make a cohort graph following 2007 or 1957 people,## assuming a 1%/year improvement after 2007m <- calc(mort$q * (0.99^(1:N)))m2 <- calc(mort$q * (0.99^((1:N)-50)))graph('le2', 'Life Expectancy', lifeExp(m$d), 'green')add(lifeExp(mort$d))L <- lifeExp(m2$d)lines(50:99, L[50:99], col='blue')dev.off()## age of death probabilities for the 1957 cohort, ## again assuming a 1%/year improvement after 2007graph('death', 'Probability of Death by Age',      m$d/m$l[1], col="black", 0.003)lines(50:99, m2$d[50:99]/m2$l[50], col="blue")dev.off()

First, the life expectancy graph, column (e) in the dataset. This shows that at birth, life expectancy is about 78, falling roughly linearly with age until around 60, after which it starts to tail off. Since not everyone dies at age 78, the graph has to tail off: it cannot go zero, let alone be negative. For example, at 80, life expectancy is about 9 more years. This means that having survived all the way to 80, you will, on average, live 9 more years - even though 80 is already past the original life expectancy at birth of 78. Of course, "you" means a member of the hypothetical cohort born in 2007 experiencing no further improvement in medical care beyond 2007 levels, not the real "you". If you are already 80, today when you read this, then these numbers are representative for your generation, but if you are 20, you can hope that 60 years of progress will make the numbers for your generation better, assuming you actually make it to age 80 yourself.

Next, we plot the individual mortality rates for each age, column (q)in the dataset. The rate for age 100 is 100%, but that is an artifactof ending the table there, so I have suppressed it to enlarge thevertical scale. You can see the blip at zero for "infant mortality",followed by almost zero death rates until people reach age 60 or so.

Finally, we look at the percent of the cohort surviving to a givenage, column (l)in the dataset. Aside from the infant mortality blip at the start,this decays very slowly until around age 60, after which itaccelerates for a while, and then tails off, since like lifeexpectancy, it can never go negative.

Visually, it looks like the median age of death (the age where the survivalcurve is at 50% of the population) is around 80, which makes sense:the median will be fairly close to the mean (life expectancy), thoughnot identical.

Why did we bother figuring out how to calculate these columns? Why notjust graph them directly in our spreadsheet? Well, now that we knowhow to do it, we can change things. In particular, we can experimentwith reducing the death rates to see how much of a difference medicalimprovements might have on life expectancy.

Since we do not know the future, we will have to presentscenarios. The next two plots show life expectancy and survival curvesunder the following scenarios:

black: the base case shown above
blue: mortality rates are cut in half across the board, meaning we multiply column (q) by 0.5
red: mortality rates rise by 50% across the board, meaning we multiply column (q) by 1.5

Can you guess what will happen?

A fifty-percent change in mortality rates seems like it should havepronounced impact. Intuitively, we expect something like a fifty-percent change inlife expectancy! In fact, life expectancy at birth changesby only 5 years up or down, with progressively less impact for olderpeople. What is going on?

Think of it this way: until age 60, the mortality rate isalmost zero, so whether we multiply it by 0.5 or by 1.5, it is stillalmost zero. Only when people are 70 or 80 or 90 are the rates highenough that 50% up or down is a big change. As a result, lifeexpectancy at birth still reaches into the 70's, even for the redline, and does not get far into the 80's, even for the blue line. Thatmeans that changing life expectancy by even one year is hard. Thatsuggests that the differences in life expectancy betweenindustrialized countries and developing countries will be difficult toreduce without massive improvements in health care;see GapminderWorld Map (2010) for a well-drawn diagram showing the correlationbetween life expectancy and economic progress.

Similarly, if we look at the survival graph, most people live to their70's or 80's under all three scenarios. For ages below 60, thedifferences are small. "Almost zero" mortality rates do add up overtime, but it takes half a century or so to see it.

So far, these graphs reflect the period life table - they showwhat would happen to the cohort of babies born in 2007 if medicalprogress freezes at 2007 levels. Let's try to make some cohortgraphs, predicting what will actually happen for those babies, as wellas for the cohort born in 1957, which is always 50 years older thanthe 2007 cohort. To draw these graphs, we have to make an assumptionabout how fast mortality rates will improve in the future. For simplicity- since I have no real data on this - let us assume that the (q)values will fall by one percent (multiply by 0.99) with each passingyear. In the next graph:

The black line is the usual 2007 cohort life expectancy as in the previous graphs,
The green line is the "actual" average number of remaining years of life for people from the 2007 cohort who have attained a given age, based on our 1%/year improvement assumption, and
The blue line is the "actual" for people from the 1957 cohort.

Can you guess what it will look like?

To make the green line, we multiply the mortality rate for age A by0.99^A, reflecting A years of progress since 2007.

However, to make the blue line, we multiply the mortality rate for age A by0.99^(A-50), since the blue cohort was already 50 years old in 2007.

The blue line begins at age 50, since the 1957 cohort has alreadyreached 50 as of 2007 - all these graphs are drawn from theperspective of 2007, not 2011, since there is apparently a 3 year lagin publishing the table.

We see that life expectancy at birth for the 2007 babies is really 83,up 5 from 78 years, provided medical care improves as we haveassumed. For those who survive to age 50, their expected additionalyears of life after 50 also rises by 5 years, from 32 to 37 (i.e. theycan expect to live to 87, rather than 82).

For people born in 1957, though, their expected additionalyears of life as of age 50 only increases by 2 years, from 32 to 34,i.e. they can expect to live to 84, rather than 82. That's becausetheir old age will happen relatively soon - in just 30 years - sothere will not have been time to improve medical care as much as forthe 2007 cohort, for whom old age (i.e. years with large mortalityrates) is still 70 years away.

All the graphs so far looked at averages. For any individual person, though,it is also interesting to know the distribution of additionalyears of life. After all, some people die young, while others livepast 100. Not everyone lives exactly to the average. What does the "bell curve" look like?

Turns out there is an easy answer. We just need to look at column(d), which holds the number of people from the original cohort thatdied at each year. Starting at a particular age A, we divide by thetotal number of people that reached A, namely l[A], to get theprobability distribution.

In the final chart, the black line shows the probability of death at agiven age for the 2007 cohort, and the blue line for the 1957cohort. In both cases, the calculations are as of 2007, and assume a1% improvement in mortality rates per year throughout the 21st century.

This picture is quite interesting. First of all, on the left side, theinfant mortality piece looks much more noticeable than it did backwhen we plotted the mortality rates. That's because those ratesleave out the size of the population they apply to. The population islargest at birth, and declines over time, so the high rates at agesnear 100 do not actually translate into very many people, since somany died along the way. This graph makes it more clear that infantmortality is still a very big problem even in the US.

Looking at the right-hand side, we see that the peak for the 2007cohort comes at a later age than for the 1957 cohort - again becausethe 2007 cohort has an extra 50 years of medical improvements to helpthem over the 1957 cohort. Wecalculated earlier that the 1957 cohort could expect to live to 84 onaverage; now we see that there is considerable spread around thatnumber, with significant numbers of people dying at every age between50 and 100.

In fact, we see that cutting off the life table at age 100 is a bitpremature: since it is all computerized nowadays, there is no reasonnot to extend it out to 110 or even 120 in order to provide moreinsight into how medical improvements affect the oldest people.

I hope this example encourages you to experiment: in whatever countryyou live, download the latest life tables and see what the forecastsare for you. You could also download several life tables fromdifferent years and compare them in order to estimate just how muchprogress improved medical care is making each year. Of course, pasttrends need not continue in the future, but they are at least astarting point for discussion.

I hope you found this interesting. You can click the "M"button below to email this post to a friend, or the "t" button toTweet it, or the "f" button to share it on Facebook, and so on.As usual, please post questions, comments and other suggestions using the box below, or G-mail me directly at the address mentioned in the Welcome post. Remember that you can sign up for email alerts about new posts by entering your address in the widget on the sidebar. If you prefer, you can follow @ingThruMath on Twitter to get a 'tweet' for each new post. The Contents page has a complete list of previous articles in historical order. You may also want to use the 'Topic' and 'Search' widgets in the side-bar to find other articles of related interest. See you next time!

2012 tax rates

8 Temmuz 2012 Pazar

Understanding Life Expectancy

Hiç yorum yok:

Yorum Gönder