Review "The Bell Curve"

by J. Laurie Snell

The Bell Curve

Intelligence and Class Structure in American Life
by Richard J. Herrnstein and Charles Murray
Free Press, 845 pp. $30.00

Part I

I promised last time to try to say what is in this book but my review is getting as long as the book so I will give only Part I this time and continue next time.


Tests to measure intelligence (called cognitive ability here) play a central role in this book. Thus, in their introduction, the authors discuss the history and the controversies surrounding attempts to measure intelligence. Modern theory traces its beginnings to Spearman. Spearman noticed that performances on tests attempting to measure intelligence were positively correlated. To explain this, he postulated the existence of a single variable that he called g which is a persons general intelligence. It is a quantity like height or weight that a person has and that varies from person to person. When you take a test to measure intelligence your score is a weighted sum ag + bs + e of the factors g , s, and e with g your general intelligence, s a measure of your intelligence relating to this particular test and e a random error. If you take several different tests, g is common to all of them and causes the positive correlation. The magnitude of a tells you how heavily the test is "loaded" with general intelligence -- the more the better. This simple model is only consistent with a very special class of correlation matrices (those with rank 1) and so had to be generalized to include more than one kind of g. This led to the development of factor analysis as a mathematical model for what is going on. It also led to the development of IQ tests to measure intelligence.

The controversies over the use of IQ began when it was proposed that they be used to justify sterilization laws in an attempt to eliminate mental retardation, and immigration laws to favor the Nordic stock. It continued when Arthur Jensen suggested that remedial education programs (begun in the War on Poverty) did not work because they were aimed at children with relatively low IQ, largely inherited and therefore difficult to change. Then followed debates over whether differences in IQ were due mostly to genetic difference or to differences in environment culminating in Stephen Jay Gould's best seller "The Mismeasure of Man". Gould concluded that "deterministic arguments for ranking people according to a single scale of intelligence, no matter how numerically sophisticated, have recorded little more than social prejudice." While the authors admit that Gould's ideas still reflect a strong public sentiment about IQ tests, they feel that it bears very little relation to the current state of knowledge among scholars in the field.

Finally, the authors discuss current attempts to understand intelligence, describing three different schools.

THE CLASSICIST: intelligence as a structure.

This school continues to extend the work of Spearman using factor analysis and assuming, as Spearman did, that some kind of general intelligence is associated with each individual. Workers in this school continue to try to understand the physiological basis for the variables identified by factor analysis and to improve methods of measuring general intelligence.

THE REVISIONISTS: intelligence as information processing.

This school tries to figure out what a person is doing when exercising intelligence, rather than what elements of intelligence are being put together. A leading worker in this field, Robert Sternberg writes: "Of course a tester can always average over multiple scores. But are such averages revealing, or do they camouflage more than they reveal? If a person is a wonderful visualizer but can barely compose a sentence, and another person can write glowing prose but cannot begin to visualize the simplest spatial images, what do you really learn about those two people if they are reported to have the same IQ?"

THE RADICALS: the theory of multiple intelligences.

This school led by Howard Gardner, rejects the notion of a general g and argues instead for seven distinct intelligences: linguistic, musical, logical- mathematical, spatial, bodily-kinesthetic, and two forms of "personal" intelligence. Gardner feels that there is no justification for calling musical ability a talent and language and logical thinking intelligence. He would be happy calling them all talents. He claims that the correlations that lead to the concept of g come precisely because the tests are limited to questions that call on these two special aspects of intellegence.

Herrnstein and Murray consider themselves classicist and state that, despite all the apparent controversies, most workers in the field of psychometrics would agree with the following six conclusions that they feel are consequences of classical theory.
The authors stress that IQ tests are useful in studying social phenomena but are "a limited tool for deciding what to make of any individual."


Throughout the book the authors make use of data from the National Longitudinal Survey of Youth (NLSY) started in 1979. This was a representative sample of 12,686 persons ages 14 to 21 in 1979. This group has been interviewed annually and the authors use the data collected through 1990.


While this study was meant to follow labor trends, a number of other groups used the subjects for their studies. One of these provided the IQ data necessary for this book. The army had been using a test called the Armed Services Vocational Battery (ASVB) since 1950 to help in the selection of recruits and for special assignments. It had been suggested that the volunteer army was selecting a group less well qualified than the Army obtained by the draft. To check this they decided to administer the ASVB to the sample chosen for the NLSY. This study was called the Youth Profile and was administered by the National Opinion Research Council. The results showed that the volunteer army was getting a higher quality army, as measured by these tests, than the draft, but at the same time found significant differences between the performance of various ethnic groups. A study of these differences and a summary of explanations for them are provided in "The profile of American youth : demographic influences on ASVAB test performance" by R. Darrell Bock and Elsie G.J. Moore. It is interesting to compare their analysis with that of the authors of this book.

The ASVB has ten subtests which vary from tests you might find on an IQ vocational test, such as automobile repair and electronics. The Armed Forces Qualification Test (AFQT) is made up of the four subtests of the ASVB: word knowledge, paragraph comprehension, arithmetic reasoning, mathematical knowledge. The authors show in an appendix that this test has the properties of a very good IQ test . In particular, they found that over 70% of the variance on the AFQT could be accounted for by a single factor, g, which they identify with general intelligence.


In this part the authors provide a number of graphs that show there is a cognitive sorting process going on in education. While many more students are going to college, a higher and higher proportion of the really bright students are going to a few select schools. We see graphs that exhibit the following:

Since these graphs display standardized scores, the authors spend some time explaining the concepts of mean and standard deviation in the test and provide a more complete discussion in an appendix.

The authors express the fear that the clustering of the high IQ students in a small number of colleges will make them isolated from and unaware of the real world.


The point of this chapter is that jobs sort people by cognitive ability in much the same way that colleges do. A group of twelve professions: accountants, architects, chemists, college teachers, dentists, engineers, lawyers, physicians, computer scientists, mathematicians, natural scientists, social scientists are considered to be "high-IQ professions". The mean IQ of people entering these professions is said to be about 120, which is cutoff point for the top decile by IQ.

The authors provide a graph showing that, until 1950, about 12% of the top IQ decile were in these jobs. Then the percentage significantly increased, reaching about 38% in 1990. They link this to education with another graph showing that the proportion of the CEO's with graduate training remained around 10% until 1950 when the proportion increased dramatically to about 60% in 1976. Combining these observations the authors conclude that, at mid-century, the bright people were scattered throughout the labor force but, as the century draws to a close, a very high proportion of these people are concentrated within a few occupations paralleling the cognitive portioning by education.


This chapter is devoted to showing that IQ is a good predictor of job performance. It is the first use of correlation, and an appendix devoted to explaining the concept of correlation is available for the reader not familiar with this concept.

The authors discuss a number of studies showing that the correlation between IQ and job performance is typically at least .4 and often more. They point out that the military offers huge data sets for these studies, since everyone in the military must take the ASVB tests (and hence also the AFQT IQ test ) and members of the military attend training schools where they are measured for "training success" at the end of their schooling, based on measures that amount to job assessment skills and knowledge. In these studies, the average correlation between IQ and job performance is about .6. By looking at the high correlation between the g factor for the IQ test and job performance, they conclude that the g factor is the key to success in these jobs.

Modern studies in the civilian population are typically done by meta-analysis of small studies leading to results similar to those found in the military studies. An exception was in a report of the National Academy of Sciences "Fairness in Employment Testing", which reported a correlation of only about .25. The authors suggest that this is because researchers for this study did not apply corrections for restricted range which they feel was appropriate for the purposes of their study. (Restricted range means that your sample did not include reasonable numbers from the entire range of possible scores). When these corrections are made they say that the correlation would increase to around .4, consistent with other studies.

The authors also compare various predictors for job performance and report the results of a study that showed that the highest correlation between a predictor and job performance rating was the cognitive test score (.53) followed by biographical data (.37), reference checks (.26), education (.22), interview (.14), college grades (.11) and interest (.10).

The chapter concludes by remarking that the Supreme Court decision of Griggs v. Duke Co. in 1971, which severely limited the use of IQ tests for job selection, is costing the American economy billions of dollars.


The main issues referred to in the Griggs v. Dude decision is the possibility of so-called "disparate- impact" lawsuits. These are lawsuits that challenge employment practices that unintentionally but disproportionately affect people of a particular race, color, religion, sex or national origin.

The supreme court has twice changed the ground rules set up in the Griggs v. Duke decision. The current rules related to these suits are governed by the Civil Rights act of 1991. According to this law, if a plaintiff shows that a specific part of the employment practice disproportionately affects a particular group, then the employer must be able to demonstrate that the employment practice or criterion in question is consistent with a business necessity (whatever that means).

In order to prove disparate impact using statistical comparisons, the comparison must be with racial compositions of the qualified people in the work force, not the racial composition of the entire work force.

When multiple employment criteria are required and it can be argued that they cannot be separated, then the entire employment process may be challenged if it can be shown to have a disproportionate effect on a particular group.

For a more detailed discussion of this legislation see "New Act Clarifies Disparate Impact Law", Casey and Montgomery, The National Law Journal , March 9, 1992.


To illustrate the concept of steeper ladder, narrower gate we are provided a graph showing that the salary of Engineers was rather constant from 1930 to 1953, at about $30,000, and then increased dramatically to about $70,000 in 1960. During the same period, the salaries for manufacturing employees showed only a gradual increase from $10,000 to $20,000.

The authors point out that the labor statistics have pointed to a mysterious "residual" in trying to account for the increased spread that occurred in real wages between 1963 and 1987, even after taking into account education, experience, and gender. Not surprisingly they suggest a case for IQ being this residual.

The authors also, in this chapter, deal with the issue of heritability of IQ. They describe the various kinds of studies which have been used to study this problem. They remark that the technical issues in measuring heritability are too formidable to get into, so they only explain how heritability is measured in one important special case, namely in studies of identical twins reared apart. Here, since the twins have identical genes and different environments it seems reasonable to define the heritability to be the correlation between the twins' IQ scores, which is found to be around .75. They say that other studies typically provide lower values for heritability but seldom under .4.

Part II: Cognitive Classes and Social Behavior

We continue our attempt to describe what is in this lengthy book. This has become less necessary with the appearance of a review by someone who has read the book. This review is by Stephen Jay Gould and appeared in the November 28 issue of the New Yorker Magazine (Page 139). However, we shall not give up quite yet.

Part II is almost entirely based upon the National Longitudinal Survey of Youth (NLSY). Recall that this study began in 1979 and follows a representative sample of about 12,000 youths aged 14 to 22. It provides information about parental socioeconomic status and subsequent work, education, and family history. It also had IQ information because, in 1980, the Department of defense gave the participants their battery of enlistment tests to see how this civilian sample compared with those in the voluntary army.

In part II the authors seek to see how IQ is related to social behavior. They limit themselves to non-Latino whites to avoid the additional variation of race which they treat in Part III. Their method is to carry out a multiple correlation analysis with the independent variables being cognitive ability and the parents socioeconomic status (SES) (based on education, income, and occupational prestige) and a dependent variable which, in the first chapter, is poverty. The next seven chapters replace poverty successively with education, unemployment, illegitimacy, welfare dependency, parenting, crime, and civil behavior.

IQ scores are standardized with mean 100 and standard deviation 15 and NLSY youths are divided into 6 groups corresponding to intervals determined by the 5th, 25th, 75th, and 95th percentiles. Those in these six groups are labeled very dull, dull, normal, bright, very bright. In the same way the NLSY youths are also divided into six groups by the 5th, 25th, 75th and 95th percentiles using the SES index and labeled very low, low, average, high, very high.

The authors find that the percentages in poverty for each of the six socioeconomic groups, going from very low to very high are 24, 12, 7, 3, 3. The percentage in poverty for each of the six IQ classes, going from very dull to very bright, are 30, 16, 6, 3, 2 They observe the similarity of these percentages and turn to multiple regression to attempt to see which is more directly related to poverty.

For this they carry out a logistic regression with the independent variables age, IQ, and SES, and dependent variable poverty. Giving IQ and SES, age has little effect so they concentrate on IQ versus SES. To do this they plot two curves on the same set of axes, an IQ curve and an SES curve. The IQ curve considers a person of average age and SES and plots the probability of poverty as IQ goes from low to high. The SES curve considers a person of average age and IQ and plots the probability of poverty as SES goes from low to high. The IQ curve shows about 26% probability of poverty for very low IQ decreasing to about 2% probability of poverty for very high IQ. The SES curve indicates about 11% probability of poverty for very low SES score and decreases to about 5% probability of poverty for very high SES score. Thus fixing SES and varying IQ has a significant effect on poverty but fixing IQ and varying SES does not have much effect on poverty. It is argued that this shows that IQ is more directly related to poverty than is socioeconomic status.

This same procedure is carried out in the subsequent chapters to show that being smart is more important than being privileged in predicting if a person will get a college degree, be unemployed, be on welfare, have an illegitimate child etc. There are some exceptions to this but the general theme of part II is that it is IQ and not socioeconomic status that is important in predicting these social variables.

Little in said about variation while discussing these examples but in the introduction to part II the authors remark that "cognitive ability will almost always explain less than 20 percent of the variation among people, usually less than 10 percent and often less than 5 percent." (They give all the regression details in an appendix). "Which means that you cannot predict what a given person will do from his IQ score. On the other hand, despite the low association at the individual level, large differences in social behavior separate groups of people when the groups differ intellectually on the average."

Discussion questions.

(1) What is the basis for the author's argument that, "even though cognitive ability explains only a small percentage of the variation among people, large differences in social behavior separate groups of people when the groups differ intellectually on the average"?

(2) In the introduction to part two the authors state that "We will argue that intelligence itself, not just its correlation with socioeconomic status, is responsible for group differences. ". What statistical evidence would allow the authors to conclude this?

Part III The National Context

Part III discusses difference in performance on intelligence tests within ethnic groups and between ethnic groups. The authors start by pointing out they have already shown that differences in cognitive abilities within a group (in particular within the white group analyzed in Part 2) can be very large and that this fact has political repercussions. They remark that the differences within ethnic classes are much larger than between classes, so any problems that these differences cause would not go away in a homogenous population.

In Chapter 13 they describe the differences between groups as they see it. They say that studies suggest that Asians have a slightly higher IQ on average than whites but these results are not conclusive. On the other hand, studies have consistently shown that blacks have an average IQ of about one standard deviation less than for whites. The difference in the NLSY data (that the authors used throughout the book) was 1.21 standard deviations. The authors remark that a difference of one standard deviation allows for a lot of overlap in the distributions of IQ scores between blacks and whites. In particular, there should be about 100,000 blacks with IQ scores 125 or above. On the other hand since there are six times as many whites as blacks in the United States the disproportion's between whites and blacks at the higher levels become very large.

The authors ask if these differences are authentic? They ask first if they could be due to cultural bias or other artifacts of the test. Studies that conclude that this is not the case are discussed briefly here and in detail in an appendix. They next ask if the differences are due to socioeconomic status. Looking at the NLSY data and controlling for socioeconomic difference they find that socioeconomic status explains 37 percent of the difference. They remark that "controlled" is hard to interpret here since socioeconomic status can also be a result of IQ.

They suggest that if the differences were socioeconomic then the gap should decrease as their measure of socioeconomic status increases. They present a graph for their data showing this is not the case. They remark that the difference between black and white IQ does appear to be decreasing in time and attribute to this to environmental changes.

They next turn to the question of whether the difference is do to genetics or environment or both. They point out that there is no consensus on this question. They present the arguments for and against a genetic explanation. The arguments presented make it clear why it is difficult to claim a solution to this problem. For all the obvious explanations they present studies to show that these explanations don't hold up. They conclude this discussion with an uncharacteristically bold statement: "In sum: If tomorrow you knew beyond a shadow of a doubt that all cognitive differences between races were 100 percent genetic in origin, nothing of any significance should change...The impulse to think that environmental sources of difference are less threatening than genetic ones is natural but illusory."

The arguments presented in this chapter are well know and I feel better presented in the book "Intelligence" by Nathan Brody, Academic Press 1992.

Chapter 14 looks at what happens when you control for IQ. Here the authors find that this removes the difference for some variables and not for others. For example, after controlling for IQ, the probability of graduating from college is higher for blacks as is the probability of being in a high-IQ occupation and wage differentials shrink to a few hundred dollars. Controlling for IQ does not change significantly the difference in black-white marriage rates, or welfare recipiency. It does reduce significantly the difference for the proportion of children living in poverty and for those who are incarcerated.

Chapter 15 is entitled "The Demography of Intelligence". The fact that women with low IQ have more children than those with high IQ and changes in the immigrant population suggest to the authors that demographic trends are exerting downward pressure on the distribution of cognitive ability in the United States. They point out that this has been a difficult matter to settle. The "Flynn effect" says that in some sense IQ scores increase worldwide with time. However, the authors conclude that there are worrisome trends in the demographic effects even though there may well be improvements in the cognitive abilities by improved health and education.

Chapter 16 considers the relation between low cognitive ability and social problems. The authors confess that causal relations are complex and hard to establish definitely. This leads them to simply ask if persons with serious social problems tend to be in the lower IQ groups. Looking at the NLSY data they present graphs with the x-axis the ten IQ deciles and the y-axis the proportion of the people with a specific problem. They start with poverty. The bar chart starts with about 30 percent poverty among the lowest IQ decile and decrease to about 7% in the highest IQ decile. When the dependent variable is high school dropouts, men interviewed in jail, or women who had receive welfare, the result is the same. After these and many more indications that low IQ is associated with trouble, the authors conclude with their Middle Class Values Index. To qualify for "yes" answer an NLSY person had to be married to his or her first spouse, in the labor force (if man), bearing children within wedlock (if a woman), and never have been interviewed in jail. Here there are respectable proportions of those saying "yes" in all IQ deciles which the authors remark should remind us that most people in the lower half of the cognitive distribution are behaving themselves.


(1) In the Bell Curve we find the following statement: "The most modern study of identical twins reared in separate homes suggests a heritatbility for general intelligence between .75 and .80". This apparently accounts for their upper bound when they say say later that the heritability of IQ falls in the range of .4 to .8. On the other hand, the .75 and .8 is actually a correlation. What seems wrong here?

(2) What do you think of the "Middle Class Values Index" as defined by Hernstein and Murray?

Part IV Living Together

Part IV of "The Bell Curve" begins with Chapter 17 which discusses attempts to improve IQ scores. Some early studies suggested that better nutrition did not improve IQ. However, while these were large studies they werenot controlled studies. Two more recent controlled studies, one in Great Britain and another in California, showed a significant difference for the group given vitamin and mineral supplements compared to the group given placebo. In the California study, the average benefit for providing the recommen- ded daily allowances was about four points in nonverbal intelligence. The authors feel that improved nutrition is effective but suggest that there are still questions about how effective.

Next, the role of improved education in raising IQ scores is considered. The authors cite studies suggesting that the worldwide increase in average IQ can be attributed to increased schooling. They conclude that variation in the amount of schooling accounts for part of the observed variation of IQ scores between groups.

The authors discuss the various studies to see how improving educational opportunities might increase IQ. They start with the negative results obtained by the Coleman study, a large national survey of 645,000 students. This survey did not find any significant benefit to IQ scores that could be credited to better school quality. (A discussion of this report is on the video series "Against All Odds").

Studies on the Head Start Program showed that this program increased IQ significantly during the period of the program, but that these differences disappeared over time. The authors mention some other more positive results but conclude that, overall, what we know about this approach is not terribly encouraging.

More positive results are cited for the hypothesis that IQ scores can significantly be improved by adoption from a poor environment to a good environment. One meta-study concluded that the increase in IQ would be about 6 points. Two small studies in France suggested that a change in environment from low socio-economic status to high socio-economic status could result in as much as a 12-point increase in IQ.

Chapter 18 is titled "The Leveling of American Education". The authors begin with a look at what test scores say about the changes in student's abilities from the 50's to the present. They present a graph of the composite score of Iowa 9th-graders on the Iowa Test of Basic Skills. The graph shows a steep improvement from the 50's to the 60's, followed by a significant decline until the 70's followed by steady improvement to a new high by the 90's. Graphs of national SAT scores show that these scores remained about the same from the 50's to the 60's and then declined significantly (about 1/2 standard deviation on verbal and 1/3 standard deviation on math) from the 60's to the 80's and then remained about the same from the 80's to the 90's.

The authors argue that the familiar explanation which claims that the great decline in SAT scores was caused by the "democratization" during the 60's and 70's is not correct. They point out that the SAT pool expanded dramatically during the 50's and 60's while average scores remained constant. In addition, throughout most of the white SAT score decline the white SAT pool was shrinking, not expanding.

They next look at what has happened to the most gifted students. They provide a graph showing the percentage of 17-year olds who scored 700 or higher on the SAT scores. The percentage for math scores decreased from 1970 to 1983 and then increased to their highest ever in 1990. Verbal scores decreased during this first period and remained steady after that.

They give the following explanation for the changes illustrated by these graphs. The decline in both the Iowa scores and the SAT scores of the 60's are attributed to what they call the "dumbing down". This period was characterized by simplifying the text books -- fewer difficult words, easier exercises, fewer core requirements, grade inflation etc.

They suggest that the "dumbed down" books would actually help the lower end of the spectrum of students and so would account for the increase in overall preparation indicated by the Iowa scores from the 80's to the 90's. The verbal SAT scores did not increase because of the use of the dumbed down books, the increased use of television, and decrease in writing generally, including letter writing. The math SAT scores did not decrease during this period because algebra and calculus are more constant subjects and harder to dumb down.

In their discussion of policy implications, they are not very optimistic about new government policies being able to solve general education problems. They point out that surveys have shown most American parents do not support drastic increases in their children's work load and, in fact, that the average American has little incentive to work harder. They argue that educators should return to the idea that one of the chief purposes of education is to educate the gifted and "foster wisdom and virtue through the ideal of the educated man".

Chapter 19 is on affirmative action in higher education. The authors present statistics on the differences in SAT score between various groups. Evidently these statistics are more easily obtained from private schools than from public schools. Their first graph shows how the average SAT scores of blacks and Asians differ from whites for entering students at a group of selective schools. The median total SAT score for blacks was 180 points less than for the whites, the median for Asians was 30 points higher than for the whites. The range of difference for blacks went from 95 (Harvard) to 288 (Berkeley). Data for students admitted to medical schools and law schools also show significant differences. In all cases average test scores for those admitted tends to follow the differences in the scores nationally.

The authors give the three reasons for academic institutions to give an edge to black students: institutional benefit, social utility, and just desserts. Accepting these, they propose a way to determine a reasonable advantage by trying to decide between two students differing only as to minority or white and privileged or underprivileged.

They show that black enrollments in college increased dramatically after the 60s, when affirmative action was introduced. It dropped off in the late 70's and has pretty well leveled off with a slight increase since then. Thus, they say that affirmative action has been successful in getting more minority students into colleges. However, they feel that the differences in performance, drop out rates, and the way that students, black and white, view these differences is harmful. The authors feel that such would not be the case if the admission policies were changed to continue to make a serious effort to attract minority applications but adopt an admission policy that would not make such large differences between the SAT distributions. The result would be a more consistent performance among the various groups and more harmony among the student body.

Chapter 20 considers in a similar way, affirmative action in the workplace. As in the case of education, the authors argue that affirmative action has had the desired effect of removing disparities in job opportunities and wages that were obviously due to discrimination. However, they look at data that suggest the results of affirmative action have gone beyond that to give a significant advantage to blacks in clerical jobs and even more so in professional or technical jobs, at least in terms of groups with comparable IQ scores.

Their previous research on relation of IQ to job performance leads them to conclude that this has serious economic implications. They feel it leads to increased racial tensions. They conclude that anti- discrimination laws should be replaced by vigorous enforcement of equal treatment of all under the law.

Chapter 21 is entitled The Way we Are Headed. The authors return to their earlier concerns that we are moving in the direction of (a) an increasingly isolated cognitive elite, (b) a merging of the cognitive elite with the affluent, and (c) a deteriorating quality of life for people at the bottom end of the cognitive ability distribution. This leads them to some pretty gloomy predictions. In the final chapter, called "a place for everyone", they give their ideas on how to prevent this. A somewhat simplified version of the author's views is: we should accept that there are differences, cognitive and others, between people, and figure out ways to make life interesting and valued for all, in terms of the abilities that they do have.


In the California nutrition study, some of those in the treated group had a large increase, about 15 points, in their verbal scores and some had no increase at all. Why might some not have had any increase?

The authors review the evidence that coaching increases SAT scores. They cite a recent survey of the studies that suggested that about 60 hours of studying and coaching will increase combined math and verbal scores on average of about 40 points. Does this seem consistent with what you have experienced or know about coaching for SAT scores?