Volume 1, Issue A :: February 1997
Understanding Quantitative Research about Adult Literacy
by Thomas Valentine
Adult educators make hundreds of small decisions every hour they are in the classroom
- decisions about what to say, how to spend time, what materials to use. When they are new
to the job, educators find these decisions difficult, but, through trial and error, they
build up personal, experience-based knowledge about what works and what doesn't work.
Gradually, they begin to recognize cause and effect relationships between what they do and
how students respond. They develop and continually refine personal theories of how
education works, and they use those theories to guide their decision-making when they
approach a novel situation or a new adult learner.
Quantitative researchers engage in much the same task, but in a far more formal way. They attempt to identify and describe patterns of behavior that are clear enough and regular enough to guide educational action. Researchers try to clarify the seeming chaos of activity that surrounds educators by discovering patterns that naturally occur, and they trust that educators will be able to use this information to improve practice.
Although many working adult educators find quantitative research too esoteric to be understood fully, its apparent complexity is offset by the clear and highly patterned logic on which it is based. Educators need surprisingly little knowledge to get the gist of the articles they read. In this article, I will provide working educators with a few basic tools that will strengthen their ability to make sense of quantitative research. Instead of presenting the type of detailed, technical information that appears in statistics books, I'll attempt to provide information that will enable working educators to critically evaluate the quality and logic of quantitative studies that might have a bearing on how they do their jobs.
The Three Most Common Purposes of Quantitative Research on Adult Literacy
Most quantitative research studies on adult literacy attempt to accomplish one of three
broad purposes: description, theory testing, and theory generating. I'll deal with them
one by one.
Description:
Many quantitative studies attempt simply to describe a phenomenon of importance to literacy educators. The many studies examining the extent of illiteracy in the United States fit into this category, as do federally-sponsored studies of program practices in adult education. Such studies vary enormously in sophistication, but, in all cases, they are non-experimental in nature and, in most cases, they are guided by broad research questions rather than by formal hypotheses. Their intent is to describe rather than to "prove." The basic logic of this category of research is best expressed by the question, "What's going on?" or "How much of this thing is going on?"
Theory Testing:
Although "theory" is a ponderous sounding term, I actually mean it in its broader and somewhat looser sense: A provisional understanding of the phenomenon that is being studied. Such "theories" range from well-informed but untested hunches to formal, empirically based theories of the psychological or sociological sort. In all of these cases, researchers approach their work with expectations about the phenomenon under investigation and these expectations are used to shape the design and interpret the results of the research study. This category of research most closely approximates "pure" statistical reasoning, and most experimental research and most research that states formal hypotheses fit into this category. The logic of this category of study can best be expressed as, "I think this is what's going on. Am I right?"
Theory Generating:
This category of research is really a subset of descriptive research, and studies in this area tend to be of a nonexperimental, exploratory nature. This type of research is undertaken when researchers do not have a clear conception of the phenomenon under investigation, yet, for any number of reasons, they believe that the phenomenon is of importance to adult education. In such cases, they collect data which will allow them to formulate models or theories that capture the essence of the phenomenon. An exploratory study attempting to "map out" the nature and impact of adult learners' academic self image on learning progress would fit into this category. The intent of theory generating studies, as the title of this category suggests, is to develop a well articulated understanding of the phenomenon under investigation. The logic of such studies might be expressed as, "What's the best way of thinking about this thing?"
Although quantitative studies can have markedly different purposes, they all use the same basic "tools." The following sections will explore the basic concepts and common analyses necessary for an understanding of quantitative research.
Basic Underlying Concepts
A basic tenet of perceptual psychology is that human perception is based on variation. If you were to look at a pure white wall that had no texture or irregularities, you would see nothing at all. If, in that wall, there was even a tiny crack, your eyes would be drawn immediately to it. In making sense of what you were looking at, your mind would automatically create a concept called "crackedness." You could then talk about any section of the wall in terms of its crackedness, with some sections having crackedness and some not.
All research builds on variations, and in statistical research, it is called variance. Things that vary, like crackedness in the above example, are called variables. Variance is the concept that underpins all statistical research.
The variance contained in variables can be described statistically in many different ways, some of which are quite familiar. Frequencies, expressed as numbers or percentages, are readily understood, because they amount to a simple tallying of the values of a variable. When a group of students is described as 55 percent women and 45 percent male, the variable is gender, the values are women and men, and the frequencies are the numbers themselves.
Means, or averages, are another common statistical expression that everyone more or less understands. The mean of a group of scores (or comparable measures) is commonly used as a way of talking about the group with a single number. However, the mean by itself can be a rather poor description of a group, particularly when the scores, taken together, do not arrange themselves into a predictable pattern. Consequently, you will rarely encounter a mean in statistical reports that is not accompanied by a standard deviation. The standard deviation indicates how spread out the scores are for that group, and it is a direct indication of variance.
Variance itself is rarely the primary focus of statistical research. Most statistical research focuses, instead, on some form of covariation, on whether two or more variables systematically vary together. For example, one would expect hours spent in instruction and learning progress to co-vary quite well, while height and learning progress would not meaningfully co-vary.
Although they appear very different on the surface, most of the statistical tests commonly encountered in adult education research reports represent attempts to establish covariation among variables. In all of these common statistical tests, if the numbers suggest that there is in fact a relationship that can't be attributed to chance, the researcher will conclude that the co-variation is statistically significant. Statistical significance indicates that there is a relationship between variables, but it doesn't necessarily mean that the relationship is strong enough to be important to working educators. Once significance is established, readers must use their non-statistical judgment to decide whether that relationship is strong enough to be considered substantively meaningful. For example, a statistically significant but weak relationship between years of schooling and learner motivation might be considered unimportant for program planning.
The statistical test actually used in any given study depends, to a great extent, on the types of variables being used. There are two distinct types of variables commonly used in statistical research about adult literacy. The first type of variable is called a categorical variable. Categorical variables vary in type or nature, but not in degree; they can't be rank ordered in any meaningful way. Gender and race are common categorical variables. The second common type of variable is called a continuous variable. Continuous variables vary in degrees, and can be expressed as a numerical scale. Test scores, satisfaction, and income all are continuous variables.
The final important concept underpinning statistical research is sampling. In most cases, researchers are attempting to identify patterns of behavior, cognition, or attitudes that apply to large numbers of people, but they only have access to a much smaller number. This small number of people is called a sample, and the sample is supposed to be a representative subsection of the larger group, or population. When the findings based on a sample are applied to a population, it is called statistical inference or generalization, and there are strict rules that allow researchers to generalize with confidence. Most of these rules require that the sample be randomly drawn from the population of interest.
Unfortunately, adult education researchers find it nearly impossible to follow the rules of pure statistics. Drawing true random samples from the population of interest is usually prohibitively expensive, so researchers often rely on convenience samples. In conducting experimental research, researchers quickly find that adult learners are not malleable enough to be randomly assigned to various "treatment" conditions, so researchers attempt to "match" treatment groups on selected variables. Despite these patchwork remedies, more often than not the compromises are severe enough to preclude any legitimate statistical inference whatsoever.
However, it's possible to glean useful information even from studies using highly compromised statistical procedures. By carefully studying the sample used in a study, educators can determine the extent to which that sample lines up with the people with whom they work. If the findings are clear enough, the sample reasonably large, and the characteristics of the sample similar to the people in their educational setting, educators can use logical inference to predict the probable implications of the findings for their own work. Consequently, work done with a nonrandom sample in Boston might have very real implications for educators working in Baltimore but none at all for educators working in rural South Dakota.
Common Statistical Procedures
In preparing this article, I looked over the articles that were recently published in journals of interest to adult literacy educators and found that surprisingly few statistical procedures were used with any frequency. In fact, if readers can understand the logic and statistics of eight basic procedures, they can understand the methodology of more than 90 percent of the quantitative pieces they encounter. I'll attempt to give a quick conceptual overview of these eight procedures, and will briefly discuss the actual meaning of the more important statistics they employ.
Procedure #1: The T-Test
A common task in research is to decide whether or not two groups are different from one another on a given variable. An example of such research might address the question: "Do female students attend more hours than male students?"
In this example, the researcher is examining the relationship between one categorical variable (gender) and one continuous variable (attendance hours). If you were asked to draw a picture of the data obtained from such studies, you might draw a bar graph, with a bar for each group of students and the height of the bar determined by whatever continuous variable you are examining.
In determining whether or not the groups differ on the variable in question, researchers compare group means to see if they are different enough to be considered "truly different." The most common test of mean difference for two group situations is the t-test. In reporting the results of a t-test, a researcher usually will present the mean for each group, the t statistic, and a p value. Although statisticians interpret all of these figures, you can make sense of the findings by looking only at the means and the p. Understanding the means is easy, but the p can be more troublesome. Although it has a different meaning in pure statistical reasoning, most educational researchers use it to determine whether the difference between the means is big enough to be considered "real." Usually, if the p is less than .05 or .01 (depending on sample size and other factors), the researcher will conclude that the means are significantly different from each other and that the findings are statistically significant.
Procedure #2: Analysis of Variance
Analysis of variance, or ANOVA, is a logical extension of the t-test. It is used in situations in which the study's categorical variable has more than two possible values. A study asking the question -- Do some minority groups attend more regularly than others? -- would require the researcher to compare the attendance hours of more than two groups (e.g., African Americans, Asians, Persons of Hispanic Origin). The logic is basically the same as that of the t-test, and the data in such a situation also lend themselves to a bar graph display. However, because there are more groups, more statistics are reported. Researchers using ANOVA typically report an F and a p for the F. These statistics answer the broad question "Is something going on in the interaction between these variables?" If this p indicates significance (i.e., if it is less than .05 or .01), then the researcher will also report a series of t's, each with an accompanying p value. These t's and p's test each possible comparison between the means of the groups involved (e.g., African Americans versus Persons of Hispanic Descent, African Americans versus Asians, etc.), and they can be interpreted using just the p's, as in the discussion of the t-test, above.
Procedure #3: Correlational Analyses
Correlation is a direct test of co-variation that requires the use of two continuous variables. Data from such studies can be graphically represented by a regression line. This line is like the trend lines that newspapers and magazines use to depict economic or population growth. Conceptually, the statistics used in correlational research tell us how well a single line can represent the measured co-variation, and thus, how much we can count on the fact that as one of our variables goes up or down, the other will follow suit. A study attempting to measure the relationship between reading ability and attendance hours (both of which are continuous variables) is an example of correlational research.
Correlational research simply asks whether or not two variables are related to one another without introducing the notion of causation. Research reports of correlational research typically present two statistics: r and an accompanying p value. The magnitude of the r indicates how strongly the two variables covary. The possible values for r range from 0.00 to +/- 1.00, with 0.00 representing no relationship and +/- 1.00 representing perfect correlation. (A negative correlation coefficient suggests that as one variable goes up, the other goes down, as might be the case with learner motivation and absenteeism.)
There is no magic number that will tell you whether or not a correlation is good. A great deal depends on the extent to which one would expect the relationship to occur. A correlation between an intelligence test score and a reading test score should be high, so a correlation of .85 might be considered uninformative. However, a correlation of .35 between hours spent reading and the number of children in the household might mean that educators need to find a way of helping parents find quiet time for reading.
Procedure #4: Regression
Regression is very like correlation, except that the researcher is willing to assert that Variable A causes, explains, or predicts Variable B, rather than the other way around or, as is the case with correlation, simply stating that they are related. The example about parents' reading I used in the above paragraph is more properly thought of as a regression, since it is more plausible to believe that the more children you have, the harder it is to find quiet reading time, than to believe that reading is an effective means of birth control. Studies using regression also report an r and a p, and they are interpreted pretty much they way they are for correlation.
Procedure #5: Multiple Regression
Multiple regression extends the logic of simple regression to allow for the simultaneous use of more than one variable in explaining another variable. A researcher attempting to explain attendance hours in terms of both reading ability and learner motivation would use multiple regression to ask the question, "Taken together, how well do reading ability and motivation explain attendance hours?" The benefit of using multiple regression in such a case, as opposed to doing two simple regressions, is that multiple regression takes into account the fact that reading ability and motivation might be related to one another; in such a case, adding together the results of two simple regressions would overstate the combined impact. Multiple regression research typically reports R-Square and an accompanying p value. R-Square represents the proportion of the variance in the outcome variable (in the example, attendance hours) that is explained by a combination of the predictor variables (e.g., reading ability and motivation). In evaluating the results of multiple regression, R-Square makes real world sense, since it is a direct indicator of the explanatory power of the variables being used to explain the outcome variable.
Procedure #6: Factor Analysis
Factor analysis employs statistics to sort a large number of variables into a smaller number of conceptually meaningful categories. If researchers used a 60-item questionnaire designed to measure learner motivation, they might wish to get a clearer understanding of the components of motivation contained in that questionnaire. By discovering patterns in the ways in which questionnaire items correlate with one another, factor analysis would allow the researchers to distill the 60 items down into a more manageable number of "factors" that, taken together, increases our understanding of learner motivation. Through factor analysis, the researchers might find that motivation has three major component factors: The intrinsic will to learn, external pressures to "finish school," and the hope of economic advancement. Factor analysis requires the reporting of numerous statistics and the use of many technical terms, but the basic idea is a simple one and the results are easy to understand if you ignore the detail.
Procedure #7: Chi Square Analysis
Sometimes, researchers are faced with the task of examining the relationship between two categorical variables. In adult literacy research, a study examining the effects of gender on dropout rate would require that the researcher discover whether men or women have a disproportionate tendency to drop out. If there is no gender effect, one would expect the percentage of dropouts who are women would be nearly the same as the total percentage of women in the study. The chi-square statistic tests the degree to which such expectations hold true. In addition to the chi-square statistic itself, researchers using this analysis also report a p. As was described in earlier sections, a p that is less than .05 or .01 indicates statistical significance.
Procedure #8: Reliability
Reliability is not really a stand alone statistical procedure, yet it tends to be mentioned briefly in most quantitative pieces. Reliability analysis is used to determine the stability of the instruments used to measure the variables in a study. If you gave your students the same reading test three times, and each time the score was radically and unpredictably different, you would conclude that the test was unreliable. Reliability is reported as a coefficient with a theoretical range of 0.00 and 1.00, with the latter representing perfect reliability. Researchers usually strive for reliability coefficients greater than .80, but occasionally they will settle for coefficients as low as .60.
Closing Comments
Statistical research is not as formidable as it appears, but it requires a special type of reasoning. Statistical reasoning involves a tight, detailed, and codified logic that can be especially difficult for people who would rather deal in broad strokes and big ideas than with the making of fine distinctions about extremely well focused concepts.
Some people view statistics with a sense of moral indignation at the fact that statistics reduces things of human importance to numbers, and they relate statistics to the power that statistics could give to a "big brother" type of government or to a scorn of bean-counting bureaucrats. In reality, of course, statistical research reduces an object of study no more than a camera reduces the object of a photograph. Statistical reasoning simply represents a highly patterned and highly public way of looking at the world, and, because its details can be readily scrutinized and evaluated, it is often preferred by funding agencies and program evaluators over more subjective and less public ways of reasoning. Like all research methods, it can be used for good or bad purposes.
Statistics are a part of the everyday life of adult educators. We use them to report
attendance, to evaluate our programs, and to learn about the demographic trends in the
broader society that affect our work. It is in everyone's best interest that working
educators learn how to be critical consumers of quantitative research. Even the best
quantitative research on adult education is ultimately meaningless unless teachers and
administrators put the findings to work.
When Reading Quantitative Research
Ask Yourself:
- What was their question?
- Who and how many did they study?
- Does the population and setting resemble yours?
- What data did they gather?
- What did they find?
- What did they conclude?
- Does this jive with your experience?
- What else might account for these findings? If these findings are true, what does that suggest for your work?