![]() |
![]() |
Dr. Gregg's Research Methods Corner
Educational ResearchThe impetus in educational research is usually to determine the efficacy of a particular program or intervention. Within the field of education, efficacy is most often determined based on some measure or measures of academic achievement. A school may wish institute a new reading program for forth grade which the program developers claim is superior to existing programs in terms of increasing reading comprehension. Teachers and school administrators are interested in using the best available programs and practices to insure the maximum level of academic performance from their students. Two questions often arise; are the results of the developer’s research truly unbiased, and was the research based on a forth grade population similar enough to allow generalization of the results to the particular school or district that is interested in implementing the program. These questions are especially important if implementation of the program is costly in terms of materials and/or teacher time.
Investigating how well a particular program of interest works most often requires research based on group comparisons. In order to obtain valid results the groups must be comparable on all factors that could conceivably influence the results. There are basically to methods for insuring comparable groups. The first method involves matching students on all factors that might have an influence on the outcome of interest. The second and preferred method involves random selection and random assignment to two or more groups. Research based on matched groups is typically avoided if at all possible since it is often impossible to match subjects on all factors that might impact the outcome. In fact, it is usually not even possible to know all the factors that might influence the results. With a sufficiently large group of subjects randomization has been shown to allow the formation of groups that are comparable on all factors. To investigate a reading program like the one described above an investigator would randomly select forth graders from a pool of forth graders for which the program will be implemented. Once all forth graders are selected they are randomly assigned to one of two or more groups. Typically, one of the groups would serve as a control (no treatment or program) 1 and one treatment group (receives the program). If additional programs are to be compared, additional treatment groups can be formed in the same manner. Following the conclusion of the program(s) the relative performance of the groups can be assessed. In the present example, a test of reading comprehension would be administered to all students in all groups and their average (mean) scores by group would be compared. If assignment to groups was done properly any differences in group average scores could be attributable to the treatment(s). Statistical analysis could then be performed to determine if the mean differences are significant. Often a pre-test is administered to all groups prior to the treatment. Pre-tests are typically administered to determine growth and secondly, as a check to see whether all groups started out at approximately the same level of performance. If randomization has been carried out correctly and with a sufficient number of students pre-testing would not be necessary. If the research(s) neither match nor use random selection and assignment and it is discovered that the pre-test scores are significantly different across groups it is likely that no valid conclusions can be made. This topic will be discussed again under Experimental Control.

Statistical significance address the question; “what are the chances that the results obtained happened by chance alone?” Given knowledge of the sampling distributions it is possible to determine the exact probability of obtaining a particular difference in means. If the probability of obtaining a difference in means is 5%, typically denoted (p <= .05) instructs us that if we were to draw random samples and calculate the difference in means we could expect to find differences as great or greater 5% of the time (based on an infinite number of samples). A “p” value of .05 also means that we can be 95% confident that our results did not occur by chance alone. Statistical significance is not solely related to the absolute magnitude of the difference in means. The size of the samples and the variability of scores around the individual group means play a large part in determining statistical significance. In fact, with a large enough sample and very little variability means differing less than one point can be statistically significant. There are two major factors that have an impact on statistical significance. The first factor is the power of the statistical test that is employed. Power is defined as the tests ability to detect “true” differences if they exist. The second factor is the number of subjects, or students in the sample (n). A larger sample also increases the power to detect differences should they exist. Practical significance unlike statistical significance is directly related to the magnitude of the difference in means. Practical significance is simply a subjective conclusion based usually on some type of cost-benefit analysis. For example, imagine or particular reading program produced a statistically significant increase in test scores on the reading comprehension portion of the AIMS. Imagine once again that the absolute difference in means between the program and no-program groups was one NCE. The scenario above while passing the statistical test of significance would likely fail the test of practical significance, especially if the program was very costly in terms of time and money. It should be noted that a statistical test of significance is required in order to determine the practical significance since a fairly large difference in means could happen by chance alone given a small sample size and/or a large amount of variability.
VariablesThere are basically three types of variables that are of interest to researchers. A dependent variable is the measured outcome of interest, from the example above, the outcome of a particular reading program is measured by a test of reading comprehension. In this case the test of reading comprehension is the dependent variable. An independent variable is a variable that is being manipulated in order to determine its effects on the dependent variable. In the example above the independent variable is the reading program itself. Again, it is possible to have multiple independent variables at one time (i.e., multiple reading programs). Finally, extraneous variables are variables that must be controlled in some way in order to draw valid conclusions from an experiment. The only difference between an extraneous variable and another independent variable is that the researcher is interested in the effects of the independent variable and not the extraneous variable. Extraneous variables are variables that can invalidate any research or experimental findings if not properly controlled. There are basically two ways to control extraneous variables, by including them in the model or design making them independent variables, or by making sure their effects are evenly distributed across all groups (randomization).
Experimental ControlThe research method using group comparisons described above is an example of an experimental design where variables are experimentally controlled. The use of an experimental design, although preferred, is usually not possible in an educational setting. There are a number of reasons why experimental designs are rarely employed in an educational setting. In some cases educators are so anxious to implement a new program, especially if the program is generally regarded as superior that little if any thought is given to its evaluation. In other cases educators may plan to evaluate the usefulness of the program but either lack sufficient training in research methodology, or fail to plan how the program will be evaluated before its implementation. Perhaps the most common reason an experimental design is not used has to do with the control group. As stated earlier, the control group receives no treatment. If the treatment or program does promote academic achievement then denying this program to a group of students is viewed as unfair and even unethical. Regardless of the reason(s), the result is an absence of comparable groups. In fact, students considered to be most needy or at risk are often selected for the treatment group. It is well known that selection based on the lowest performers will often result in gains even in the absence of treatment, a phenomenon known as “regression to the mean”. Unless another equally at risk group can be formed (same school etc.) a valid comparison and evaluation using an experimental design is not possible.
Statistical ControlStatistical control requires the measurement of all variables that may have an impact on the results of an experiment. Again, this is not the preferred method of control, but is often necessary in an educational setting when the use of random groups is not possible. If it is determined (through research) that socioeconomic status (SES), gender, and a child’s predominant language have an impact on reading comprehension (sticking with the same example) these variables will need to be measured in addition to the independent variable(s) of interest. If the researcher discovers that by chance gender, SES, and primary language are equally represented across the two or more groups then statistical adjustment may not be necessary. On the other hand, if it is discovered that one or more of the variables are not equally represented (proportionally) then statistical control would be necessary. Without going into more advanced statistical procedures, suffice it to say that once the magnitudes of the extraneous variables are determined their contribution to the overall results can be statistically neutralized. As a simplified example, if it is the case that females score on average five points higher than males, then five points could be subtracted from the females’ total scores before comparisons are made. This example is used only for conceptual purposes and it would not work for multiple variables.
CausalityCausality has been a subject of debate among scientists and philosophers for centuries. It is generally agreed however that demonstrating a cause and effect relationship requires more than just demonstrating a consistent relationship between two variables. In addition to demonstrating a consistent relationship one must also demonstrate a temporal relationship between two variables. The cause must always precede the effect, and the effect cannot exist without the cause. A cause may produce a permanent effect or an effect that diminishes over time. It is also generally agreed that demonstrating a cause and effect relationship requires an experimental paradigm with multiple measurements and a control group. An experimental paradigm like the one just described is often difficult if not impossible outside a controlled setting. Much of behavioral research and especially educational research is designed to show a relationship between two or more variables (correlational). Since correlational designs fail to indicate directionality (which variable is the necessary precursor “cause”), and whether the cause is necessary for the effect, the results must be interpreted with caution.
As an example, a relationship between minority status and educational achievement has been demonstrated. Once more, it appears that membership to a minority group is associated with lower achievement test scores. On the face of these results some may be tempted to say that minority membership “causes” lower test scores. However, it has also been demonstrated that membership to a lower socioeconomic (SES) group is often associated with lower achievement scores. It is possible through the use of statistics to separate the effects of minority status and SES. It has been demonstrated that once the effects of SES are removed from achievement scores the effects of minority status all but disappear. This knowledge leads to a different conclusion – it is not minority status per se that is associated with lower test scores, but more likely it is SES that is associated with lower test scores with minority status being an intervening variable. In other words, the correlation between minority status and achievement scores is more likely a result of minority groups being over represented among the low SES group.
1 In addition to a treatment group and a control group it is often recommended that a third group be added, a group that receives some treatment related to the treatment of interest. This is done for two reasons: 1. to counter the “Hawthorne Effect”2 and 2. The possibility that simple practice devoid of any sophisticated paradigm or model may produce similar results. For example, some researchers have shown that simply spending additional time reading produces similar results to those produced by more sophisticated (and expensive) reading programs.
2 The Hawthorne effect is defined as changes in behavior resulting from attention participants believe they are getting from researchers, and not the variable(s) manipulated by the researchers.