7#jgDfzzzBP**jx ,*bIv
EPublished as Chapter 5 in Janet L. Abu-Lughod (Ed), Sociology for the Twenty-first Century: Continuities and Cutting Edges, International Sociological Association, University of Chicago Press, 1999, pp. 83-93.
We Can Count, But What do the Numbers Mean?
Joel H. Levine
Professor of Mathematical Social Sciences
Dartmouth College
Hanover, New Hampshire 03755
joel.levine@dartmouth.edu
Prof. AbuLughods suggested title carries double content. It is, on the surface, a question about the direction in which quantitative work in sociology will move at the beginning of the new millennium. It is also an invitation to discuss the tension in sociology that characterizes the end of the present millennium, the tension between numbers and ..., and what? Like C.P. Snow in his The Two Cultures and the Scientific Revolution, Im not sure what label to use for the other side. There are the numbers people, and there are those who are defined by their contrast to the numbers people.
I raise the matter of this tension here, at the beginning, in order to make the key point clear: The key point is that meaning lies in numbers. More precisely, meaning lies in the equations when those equations are closely related to data. In sociology it will be more true than for some of the natural sciences that meaning will lie in the equations.
Would that sociology could be easier. Would that pure mind, rhetoric, and verbal exchange could steer a true course through obstacles that are difficult to penetrate by data, by numbers, and by method. But it is not so. In the words of Emile Durkheims Rules of Sociological Method, written at the beginning of the century now closing:
In the present state of the discipline we do not really know the nature of the principal social institutions, such as the state or the family Yet it suffices to glance through the works of sociology to see how it is believed that one is capable, in a few pages or a few sentences, of penetrating to the inmost essence of the most complex phenomena. This means that such theories express, not the facts, which could not be so swiftly fathomed, but the preconceptions of the author before he began his research.
Emile Durkheim in the preface to the second edition of The Rules of Sociological Method, 1901, Halls Translation, Free Press, 1982, page 38.
In our era, at the end of the century, there is a balance between the quantitative and verbal communities. Much of the quantitative work is concerned with description, significance testing, and extrapolation. But the balance is shifting as both concept formation and theory are becoming quantitative. That which is grandly and vaguely called sociological meaning is shifting into the quantitative domain and that which is even more grandly and more vaguely called sociological theory is also shifting into the quantitative domain. The shifts are being driven by two things: Serious advances in what are called mathematical models but should be called theory, and rapid decrease in the cost of data describing individuals and events.
Mathematical Models
To understand the advances in models it is necessary, first, to understand that the quantitative community recognizes, within itself, at least three different strategies each of which appropriates the word model to a different purpose. There is first, and best known, the statistical strategy embedded in the canon of our statistical methods classes. Second, there is the deductive strategy, strongly represented in game theory, decision theory and aspects of rational actor theory. And third, there is the inductive strategy, inductive modeling, strongly represented in network analysis and stratification models.
All three strategies use the word, model. Within the first, within the statistical strategy, model refers to equations like the linear equation and the Gaussian distribution. In Carl Sagans words, science is skepticism, and in our discipline the statistical doubting of description with significance testing is the strongest barrier among the fragile defenses that separate sociology from pre-scientific speculation and from absorption by ideological agendas. For detecting the presence of relations and for defining criteria of reasonable doubt the statistical disciplines are an intellectual and practical triumph.
By contrast, in both inductive and deductive models, a model is theory. The distinction between statistical models and theoretical models is a matter of degree. It lies in the extreme detail with which a theoretical model is compared to data. Outside of certain areas of biology, it was never intended that the line and the Gaussian distribution (of the statistical strategy) be interpreted precisely, in all their detail, as theoretical models. By contrast, as theory, every detail of a theoretical model has meaning.
Let me illustrate by translating the linear equation and the Gaussian distribution into words as if they were theoretical models. To see the difference between one use of a model and the other begin simply, as if you were once again a bright undergraduate. Take a group of undergraduates, preferably undergraduates with experience in natural science, and show these undergraduates the scatter diagram supporting one of sociologys big reliable correlations. Show them something like the correlation between the social economic status of a son and the social economic status of his father. Then talk about these data using the word line in your description and watch the undergraduate reaction: They see no line in the scatter diagram of these data and, in detail, they are right. Even with strong correlations like r=.4, which are strong by professional standards, even with extremely strong correlations like r=.8, which are rare, there is no line in these scatter diagrams. It takes statistics to find the correlation in such data and to verify its presence. There is a positive correlation between the status of the father and the status of the son, absolutely. But a line? No.
This level of scrutiny becomes appropriate only when you use statistical models as if they were theories. Perhaps, 50 years ago, sociologists might have believed that they could hack their way through the underbrush of large data bases, piece by piece, accumulating variance explained. They would assemble more complicated models, piece by piece, 15% of the variance explained by one independent variable, 10% more explained by another, 10% more by a third, gradually eliminating uncertainty. And then, presumably, the predictions of sociology would approach certainty as explanation approached 100%. That hasnt happened, of course. It was a misreading of these models that led to the expectation, among some, that it might. And it would be a mistake to become discouraged because it has not.
What would it mean if the line and the bi-variate normal distribution were interpreted as theoretical models? What would it mean if social mobility, status of son compared to status of father, were normally distributed? As theory, the details of these models are alive with meaning, every symbol has meaning for the scientist. Even the x in a linear equation is a hypothesis: To write x is to say that there is a space in which people move. Usually, and surely with the bi-variate normal, to write x is to hypothesize that the space in which people move is one dimensional. That is meaning, probably not valid meaning for social mobility, but that is meaning expressed in an equation.
The detail of the Gaussian equation, interpreted as theory, says more. In detail, the equation is a hypothesis about which events are most frequent (those for which the status of the son is equal to the status of the father) and about the rate at which frequencies decrease as a function of the distance between father and son. It says, specifically, that the frequencies decrease as a negative exponential function of the square of the distance between them.
Those are equations and numbers, the exponential and the value of the exponent, that have meaning about nothing less than the nature of social space, or would have that meaning if the statements of the Gaussian equation were valid as theory. To see the meaning invested in the value of the exponent, 2, consider the inverse square laws of physical space. In physical space the 2 of the inverse square laws goes hand in hand with the nature of physical space. The easiest physical example is the attenuation of light, Figure 1.
Figure 1
The attenuation of illumination per unit of area as a function of the distance from the source
The light illuminating a unit square one unit from the source diffuses to illuminate four unit squares at a distance of two units from the source. Therefore it is reasonable to expect that the illumination on a single square will diminish in proportion to the inverse square of the distance. Reasonable is not enough, of course. It has to be checked. But the point is that the number, 2, in the inverse square laws is rich with meaning. The number is a statement about the dimensionality of the space, three dimensions, and about the process of diffusion. The Gaussian equation with its inverse exponential squared distance law would be a theoretical statement about social space if it were correct.
Thats the translation if you use the normal distribution literally, as a social model. Social space, it says exists. Social space, it says, is one dimensional. And social mobility, it says, scatters or diffuses in some sort of dependence on the square of the distance. If you means these things literally, as theory, then the whole package, the equation, the meaning, the interpretations, and the theory are in jeopardy when the equation is compared to the data.
Fortunately, it fails. Fortunately because if it were a good fit we would have to live with the meaning. It fails not because it is not very good at variance explained. The line and the Gaussian are very good at explaining variance, but they make systematic errors that show that some part or all of the package is wrong. It fails because it systematically predicts that too few people will move, fewer people than are actually found to move when the model is compared to the data. Therefore, all or part of the package social space, one dimensionality, squared distance is wrong. (Levine, 1972, 1978,1993).
That is what makes a model a theory. But, as I suggested, this is not the end of the story. The current chapter of the saga of quantitative methods in sociology is that some of the models have begun to work. The disputes among those of us who work with log linear models, and with crossings models, and with special interaction effects continue. The meanings of the equations we write have changed. And the models have begun to work for some cases, for some data. Some of the current theoretical models work well enough to exhaust the information in the data.
In some cases, with some occupational data and with some social network data, here is what we now know, tentatively:
Fix number 1: The Gaussian model, if it were correct, would mean that social space was one dimensional, at least for the occupational data and the network data to which it applied. That is unlikely. It is reasonable to expect one dimension, status, to dominate the pattern of social distance, but it would be surprising if that one dimension were sufficient. Changing the symbol from x, to , from a single coordinate to coordinates in two or more dimensions, works better.
Fix number 2: The Gaussian model, if it were correct in two or more dimensions, would include a hypothesis about geometry. Never mind the meaning of Euclidean geometry as geometry is taught in mathematics. For us, in our science, the meaning of the phrase Euclidean geometry is a hypothesis about the combined effect of differences in two or more dimensions. It predicts that the strength of the combination of two dimensions will be predicted, first, by computing the squares of the two differences, second, by adding the two squares together, and then, third, by taking the square root of the result. (That is the Pythagorean theorem, translated as theory). The Euclidean model is a truly bizarre model of the combined effect of differences in two or more dimensions. It is all the more bizarre because it actually works in physical space. Imagine the experience of intellectual triumph that must have accompanied that discovery, 2,500 years ago.
It would also be bizarre if differences in two or more dimensions combined this way in our own science: Imagine comparing my social economic status to someone elses by computing the square of the difference between our years of education, as well as the square of the difference between our incomes, by adding the two squares together, and then by taking the square root of the result. Possible, but not obvious. An alternative would be to simply add the differences in the separate claims to status, predicting that the strength of the combination will be found by adding the individual components, no squares, no square roots. Translated, back to the math, that means hypothesizing a city block metric, not a Euclidean metric, for social space. (Torgerson, 1958, pp. 251-254, referrring to Attneave, 1950 and Coombs, 1964, pp. 202-206, referring to Landahl, 1945) And that model works better.
Fix number 3: The Gaussian model uses squared distance in the equation and it predicts too few people staying in the status of origin. Changing the number in the exponent from 2 to 1, from squared distance to simple distance, without the square, creates a different prediction. It changes the shape of the curves from the shape in Figure 2a to the shape in Figure 2b.
Figure 2a Figure 2b
Two Back to Back Exponential Decay Functions of Distance from the Center.
Figure 2a shows back to back negative exponential functions of the squared distance.Figure 2b shows back to back negative exponential functions of the simple distance.
Simple distance predicts greater immobility (at the center) relative to other events (off center). If it works, then the explanation of this power, 1, is a theoretical problem, a real theoretical problem: It suggests too, that we have to understand something about the diffusion of information in social networks. For mobility it may mean that job searches are, somehow, one or two dimensional even though the occupational system itself is more than two dimensional. Whatever the explanation, the result, that 1 referring to the first power of distance, is a statement about social process that has to be explained because it works better.
Fix Number 3: And finally, the clincher: Join these fixes in one equation and you have a hypothesis that fits the data well enough, in some cases, to exhaust the information present in the data. The statement about the fit is restricted: With some problems in social mobility and with some work in social networks the models match the data as well as it would be matched by going back to the source and replicating the data. (Levine, 1985)
_____ _____
So what? Why should sociology at large be changed by advances in stratification and networks? In part, of course, because stratification and networks are central to the discipline. And in part, because the mathematics of these models offers simplicity: The mathematics is simpler than verbal gymnastics, and no more difficult than the mathematics of contemporaneous statistical models used for policy and forecasting.
More important, sociology at large is affected because when one piece undergoes serious change, there are consequences. It is already the case that in some areas of stratification the language of theory is mathematics. Words can attach intuition to the equations. Words can help communicate. But the first language of theory is, in some areas, mathematics not English, not French, not German, but mathematics. Question: Is social space, is social organization divided and partitioned into classes or is it, by contrast, continuous (stratified but undivided)? Question: Is ownership of the means of production critical to social status? The primary language for such research is mathematics. In this area of discussion, debate is wasted unless the arguments are put in falsifiable form, as models, and tested in detail against the data.
Data
At least two pieces of the discipline are changing models are one piece, the cost of data is another. The cost of data is not just bookkeeping, whether it costs 10, 100, or 1,000 dollars per subject. More than that, cost insinuates itself into the way we think. In principle, at least, sociology is about relations among people and relations among social facts. But sociometric data that describe relations among people in real world organizations are far more expensive than survey data that describe personal attributes of a random sample of individuals.
In social psychology and network analysis much of the work uses small groups. Much of the research uses small networks, often face-to-face groups, often set pieces like the venerable bank wiring room data first analyzed in the early 1930s by George Homans. I ask you a question: Why would anyone want to analyze the data recording petty battles of obscure individuals in odd work groups, childrens camps and college dormitories? I dont think anyone cares about these things per se . They were a means to end which lay, in part, in a concern for large events of the 1930s, as fascism grew in Germany. Social psychologists asked, what were the bases of democracy? What were the prerequisites of acceptable human organization? What tendencies led to authoritarianism?
Even in the 1930s it was never clear that small groups were the right experimental animal for questions raised by the larger society, but it was worth a try. Other psychologists and sociologists, responding to the same moral urgency, created work like The Authoritarian Personality, (Adorno, 1950) looking into the human mind. Other sociologists created work like Union Democracy, (Lipset, et al, 1956) looking at labor unions as organizations created for laudable purpose which had, nevertheless, in some cases moved toward authoritarianism. And some social psychologists looked at small groups, like summer camps and school classrooms that seemed capable, in come cases, of ending up like Goldings Lord of the Flies (1962). Perhaps what ailed these small groups was the model for what ailed the larger world.
And, in any case, whether small groups were the best experimental animal or not, it would have been difficult, at that time, to use anything larger. The data were too expensive. It took the resources of the Harvard Business School and a promising young scholar, George Homans, to collect data for the fourteen men of the bank wiring room. And almost to this day, two generations later, there has been a premium on detailed sociometric data, data that record who does what to whom, in detail, and in time series, for real world groups, large or small. Too expensive.
Now, sixty-five years later, our options are different. Models, discussed earlier, are changing the nature of theory and the Web is changing our access to data. Today on-line data bases present any scholar, indeed anyone, a report of who does what to whom on a world scale. We have hundreds of paid observers called reporters. Thats what they call themselves, but we know they are working for us: Toiling in the interest of our science, collecting the data on who does what to whom and on the nature of the relation daily, on a large scale, with multiple observations. Newspaper abstracts update the daily activities of our subjects. Citation indices give us access to the developing structure of science, literature, and politics. On-line intelligence reports connect events and their players on a world scale. Library indices and search engines connect world problems, international organizations, lobbyists, politicians, intellectuals, and business.
Consider the arithmetic of these new data bases: A master index of world biographies contains about 6 million entries. Six million is an upper bound on the population that "counts" and it includes massive double counting. If the President of the United States has a biography in Who's Who in America, and in Who's Who in the East or in Washington, or in Politics, or in any other register, all of these are counted as separate entries among the total of 6 million. That pares the number of people who count from 6 million to a considerably smaller number. And, at the risk of offending a few million poets, baseball players, and movie stars whose lives are included among the 6 million biographies, Im willing to pare them from the total as well.
This leaves perhaps one million people, perhaps one hundred thousand, as the number of people who appear in these archives, about one million to one hundred thousand people who act on either a world scale or local stages. This is not a large number by present technical standards.
These data, including both the Web itself and the larger array of electronic archives, are an experimental animal waiting for our discipline to adopt it if we dare. No need for a sociologists to pull someone aside for an interview and ask how do you feel about so-and so? All we need do is read as the major players signal who is in, who is out. Or, when signals fail, all we need do is read the budgets to see who is supporting whom, or check the movement of armies to see who is the ally of whom, or check the casualty reports and body counts for definitive evidence of negative affect.
The Result
I suggest that these two together, better theory combined with data on a new scale, can or must change the discipline. When I say they can change the discipline I mean to suggest that there is an opportunity, now, to do sociology as our predecessors might have wished to do it from the beginning. Parts of our statistical methodology, parts of our main-line methods of data collection were, in their time, brilliant adaptations to the limits of the time. As a practical matter it was difficult to study detail except in small populations. It was difficult to study large populations except by aggregating people into categories working around methodological problems and data problems that no longer constrain us so heavily.
When I say these changes must change the discipline, the challenge is this: Good science requires jeopardy. Sociology can go a long way by collecting data, analyzing it, classifying it, comparing it, and contrasting it and presenting the result to professional audiences. But good science must take chances. It must look at people and issues that are important, using names and making predictions. It must say things that matter to people outside the science. It must make statements in full public view and here is the risky part it must take the chance of being wrong on matters of significance, also in full public view.
If we were to make a foolish statement about Hasulak and Taylor (two denizens of the bank wiring room), few people outside the profession would even catch the reference, much less the foolishness of the statement. There is little risk with such experimental animals.
By contrast, if we were to make a statement about institution building and coalitions in Russia, or in Central Africa, or Yugoslavia there would be jeopardy. If the statement were foolish, then there would be obvious pressure to reformulate the theories behind the statement. If the statement were prescient, then the larger world would take note. Getting our discipline out into the real world is critical to the health of the discipline. Better theory, better data, and the computing power to handle them make it possible to move the discipline closer to the real world. The health of the discipline requires that we seize the opportunity and the jeopardy.
Bibliography
Adorno, T. W., et al, The Authoritarian Pesonality, in The American Jewish Committee Social Studies Series, Publication No. 3., Harper, New York, 1950.
Attneave, F., Dimensions of similarity, American Journal of Psychology, 63, 516-556.
Blumen, Isadore, Marvin Kogan, and Philip J. McCarthy, The Industrial Mobility of Labor as a Probability Process, in Cornell Studies in Industrial and Labor Relations, Volume VI, 1955.
Coombs, Clyde, A Theory of Data, Wiley and Sons, New York, 1964.
Durkheim, Emile, The Rules of Sociological Method, 1901, Halls Translation, Pree Press.
Golding, William G., Lord of the Flies, Coward-McCann, New York, 1962.
Landahl, H.D., Neural mechanisms for the concepts of difference and similarity, Bulletin of Mathematical Biophysics, 7, 83-88.
Levine, Joel H.
1972 "A Two Parameter Model of Interaction in Father-Son Status Mobility," Behavioral Science, September.
"Comparing Models of Mobility," American Sociological Review, February, 1978.
1985/1989 "Occupational Mobility: A Structural Model of Intragenerational Mobility in Time Series," with John Spadaro, in Structural Sociology, Wellman and Berkowitz (Ed.), Cambridge University Press.
1990 "Measuring Occupational Stratification using Log-Linear Distance Models," in Social Mobility and Social Structure, Ronald L. Breiger, (Ed.), Cambridge University Press
1990 "Friends and Relations: A Comparison of Positive and Negative Sociometric Forms", Connections, 1990.
1993 Exceptions are the Rule: Inquiries on Method in the Social Sciences: A critique of sociological methodology, with structuralist solutions,Westview Press, (Tilly and McNall, series editors), May, 1993. (http://www.dartmouth.edu/~jlevine/)
Lipset, Seymour Martin; Martin Trow, and James S. Coleman, Union Democracy; the Internal Politics of the International Typographical Union, Free Press, Glencoe, 1956.
Snow, C.P., The Two Cultures and the Scientific Revolution, the Rede lecture, 1959, Cambridge University Press, Cambridge, 1959.
Torgerson, Warren S., Theory and Methods of Scaling, Wiley and Sons, 1965.
The effect is similar to the finding in pioneering work by Blumen, Kogen, and McCarthy 1955, discussed in more detail than here in Levine, 1985.
This discussion is condensed from the discussion in Levine, 1993, op. cit. The argument with respect to stratification is presented at length in Chapter 10, Real Social Distance. The argument with respect to stratification is presented in Chapter 8, Friends and Relations.
Condensed from the discussion in Levine, 1993, op. cit.
Levine joel.levine@dartmouth.edu
transfer9707:ASA ISA 9705:ASA ISA 9705.6 File printed: Page
u
J<,ddProLLL
10P
"9###################
U"WU8AV18AGW.1GIX+.ILY)+LNZ()NP['(PQ\&'QR]%&RS^#%ST_TU`UVa"#cVWe"#gVWh#$i$%UVj%&kTUl&'m'(STn()RSo)*QRp*+PQq+,OPr,-NOs-/LNt/0KLu01JKv12HJw23GHx34EGy46DEz67CD{78BC|899B "9##############
# ############ e@vTP
"n###%.##4###fff
\889:;=>?@BCDEGHIJLMNOPRSTUWXYZ\]^_abcdnopqrtuvwxy{|}~##%.#4#P
"CT###%.##4###fff
\
Tmy
TUUVVWWXXYYZZ[[\\]]^^__``aabbccd!de"ef#fg$gh%hi'ij(jk)kl*lm,mn-no.op/pq1qr2rs3st4tu6uv7vw8wx9xyCTUDUVEVWFWXGXYIYZJZ[K[\L\]M]^N^_P_`Q`aRabSbcTcdVdeWefXfgYghZhi\ij]jk^kl_lm`mnbnocopdpqeqrfrsgstitujuvkvwlwxmxy##%.#4#P
"xT###%.##4###fff
\BTyBTUCUVDVWEWXGXYHYZIZ[J[\L\]M]^N^_O_`Q`aRabSbcTcdVdeWefXfgYghZhi\ij]jk^kl_lmamnbnocopdpqfqrgrshstitukuvlvwmwxnxyxTUyUVzVW{WX|XY~YZZ[[\\]]^^__``aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxy##%.#4#P
"py###%.##4###fff
\:y:yz;z{<{|=|}?}~@~ABDEFGIJKLNOPQRTUVWYZ[\^_`acdefpyzqz{r{|s|}t}~v~wxyz{}~##%.#4#P
"y###%.##4###fff
\oyoyzpz{q{|r|}t}~u~vwyz{|~yzz{{||}}~~##%.#4# e? e>g d>
T"f$P
"i+########################
"i+#######################`
q
c
S
dxpr
,Palatino
*
"
currentpoint
""(X 30 dict begin
currentpoint 3 -1 roll sub neg 3 1 roll sub 352 div 416 3 -1 roll exch div scale currentpoint translate 64 54 translate
/thick 0 def
/th { dup setlinewidth /thick exch def } def
16 th 0 8 moveto 256 0 rlineto stroke
1 330 moveto
/fs 0 def
/sf {exch dup /fs exch def dup neg matrix scale makefont setfont} def
/f1 {findfont dup /cf exch def sf} def
/ns {cf sf} def
384 /Palatino-Bold f1
(X) show
end
dMATH!
"Xt HnJJJJddProP"I#####
###
K"H ##
# ############
# #
##
8H*HHHddProP"G####
I"F
#
### ###P
"G####
I"F#######'Equation
"X Toolbar buttons to the command and picture assignments they had when you started the current Word session.=WIll reset the Toolbar to its settings when you started Word. ^0:
^1Word Toolbar PrefsFYou will not be able to Undo this operation. Do you want to continue?@The Toolbar Plug In Module requires Word4{Ul@\
&
&z""$$*$*%,,.f.g..667777779_9c;4;B<+ABBD>GKJLPPSTVVW Y\l]zǿ|tlf\tTl #(~ #(#(#(
#(#(
#(#(#(#(#(#(#(#(#( #(#(
#(
#(#( #( #(J#(#(]z```!`aab
bbbc*c:ccdemeffgvghBhChDhij+j,jNjOjjjj̾綱 X#( @#( !#(@#(#(#(#(#(#(#(#(#( @ #(! 6
Palatino IStandard paragraphIndent ParagraphQuoted ParagraphLine at Top of Fig
Chapter TitleFirst Sub-TitleChapter Number
Figure numberFigure titleFigureLine at Bottom of FigAfter Figure spacerBulleted paragraph
Figure LegendSecond Sub Titlestandard paragraphg$
0
qPP X @
PPP
H
4
)$<^@i
ij '!(3);EQ\zci K N
h#ehQjBCD7]zjEFG
U u #/127::;;1;};;;;;<g>l>o>q?HH(FG(HH(d'`@=/BH-:Acrobat PDFWriter
PalatinoTimesEmbedded_Object_1-fiH`-giH(;>EASA ISA 9705.3ISA WHERE TO FOR MATH SOCJoel LevineJoel Levine