Resource Home| Documents Home

Using Multiple Regression to Model Relative Growth to Evaluate Teacher Effects

September 22, 2004

What is Teacher Effectiveness?

Teaching ability (effectiveness) can be difficult to define and even more difficult to measure directly. Since teacher effectiveness is difficult to measure directly many researchers use statistical methods to remove variance that is considered to be “beyond the teacher’s control”. The rationale being that whatever variance is left must be related to teacher effectiveness. The left-over variance is called a “residual” in statistical terms, and it is used as a Teacher Effectiveness Index (TEI). Having a well delineated operational definition of teacher effectiveness is a crucial first step to measuring or modeling teacher effectiveness. For this investigation teacher effectiveness is defined as the teacher’s ability to maintain or accelerate a student’s relative growth".

What is Relative Growth?

Growth for the purposes of this investigation is defined as "a student’s rank order standing within his or her cohort group over time”. Here the term "cohort" is defined as all students at the same grade level across the district. For a student to maintain their rank order they would need to score at the same percentile on two consecutive years. A student scoring at the same percentile would also demonstrate one year’s growth. The student would be scoring as expected based on past performance. While learning curves tend to be curvilinear over time, relative growth curves tend to be straight (linear) over time. Statistical methods based on the General Linear Model (GLM) produce a good data-model fit minimizing the average error in prediction.

Best Predictors of Performance

The best predictor of future performance is past performance. While other variables including demographic variables like SES, mother’s educational level, or ethnicity may predict performance they are not part of our definition of teacher effectiveness. In fact, it will later be suggested that removing demographic variance may inadvertently remove a substantial portion of variance associated with teacher effectiveness. Accurate and reliable prediction is dependent on the reliability of the predictors. As predictors of academic performance we use standardized achievement tests that have demonstrated high reliability. We then maximize reliability by combining all available measures of academic performance for a given student for every year the student was in the district.1  

Academic performance for the most recent year (dependent measure) was predicted based on a model that uses multiple measures per year across years.

Residuals and Error Variance

In accordance with our definition of teacher effectiveness the only variable modeled was the student’s growth. That is, the only variance removed was that associated with student performance.   The residual or TEI has two components, systematic variance associated with variables that were not included in the model (e.g., teacher effects), and random variance associated with unreliable measures.  For the residual or TEI to be reliable, the systematic component of the residual (teacher effect) must be maximized while at the same time minimizing the random component (error).  Increasing the systematic variance in the residual was made possible under the current model by using an a priori definition of teacher effectiveness, as opposed to a posteriori definition that defines teacher effectiveness as the residual variance that is left over after removing all the systematic effects on student performance (e.g., variance related to student demographics). The current model minimizes the random variance component by using reliable measures of student performance. In signal detection terms the aim is to maximize the signal to noise ratio in the residual.  Often researchers focus on modeling student performance in order to maximize prediction (i.e., large R2).  Maximizing R2 is appropriate if accurate prediction is the goal.  On the other hand, if the residual is to be used as a reliable measure of teacher effectiveness (TEI) then maximizing R2 produces an unacceptably high random to systematic variance ratio.  Removing all the systematic variance can be referred to as “over modeling”.  A more detailed discussion of the residual and the TEI along with their computation can be seen in (Appendix A).

Multiple Regression (MR) vs. Hierarchical Linear Modeling (HLM)

Two and three level HLM models maximize prediction by accounting for additional variance resulting from the interaction between variables at different hierarchical levels (e.g., school and classroom). Since we are only interested in one level of analysis (i.e., the classroom) there was no need for multi-level methods of analysis. Given that there was no need for a multilevel analysis of the data and maximizing R2 was not our ultimate goal we chose simple multiple regression as our statistical method of analysis.

1 Students were tested using the Stanford 9, Arizona Instrument to Measure Standards (AIMS), and the District’s Core Curriculum Standards Assessment (CCSA).  Most students have at least two assessments a year, the Stanford 9, and either the AIMS or the CCSA depending on the grade level of the student.

In addition, the two-level HLM tended to over model by removing classroom effects which were the effects we were interested in (i.e., it tended to remove all the systemic variance from the residual). Again, this result would be desirable if prediction were the goal, however when the systematic portion of the residual is of interest the two-level HLM produces residuals with very low year-to-year reliability.

Evaluating the Model

The utility of a teacher effectiveness model must be determined based on its validity and reliability. In order to demonstrate its validity and reliability it must meet four criteria:

1.      Cross validate with other measures of teaching performance

2.      Show little or no relation to demographic variables

3.      Show little or no relation to grade level

4.      Show reliability over time

To demonstrate cross validation the teacher effectiveness indices were correlated with teacher evaluation ratings by principals on an instrument specifically designed for this purpose.  The results can be seen in Table 1.  As can be seen most correlations are moderate ranging from ~ .30 to ~ .50 with the exception of those correlations under “Professional Responsibility” most of which were significant (p < .05).  While these correlations are not as high as one might expect they do demonstrate shared variance between the two measures.  The correlations are more acceptable given that the two methods of evaluation are measuring different facets of education.  The teacher evaluation focuses more on “process” while teacher effectiveness focuses more on “outcomes”.  The Teacher Evaluation Questionnaire can be seen in Appendix B. It should also be pointed out that the correlations were based on a small sample of teachers.  Given equivalent variances a larger sample would have yielded a larger number of significant coefficients.

Table 1

Cross Validation of the Teacher Effectiveness Index (TEI)
with a Specially Constructed Teacher Evaluation Questionnaire

Teacher

Evaluation

Teacher Effectiveness Index (TEI)

Reading

Math

Writing

Overall

Planning & Preparation

 

 

 

 

Question 1

.47

.43

.42

.47

Question 2

.20

.20

.10

.18

Question 3

.29

.31

.19

.28

Question 4

.41

.40

.31

.39

Total

.38

.37

.28

.36

Classroom Environment

 

 

 

 

Question 1

.32

.30

.23

.30

Question 2

.39

.24

.23

.30

Question 3

.39

.21

.30

.31

Total

.43

.30

.29

.36

Instruction

 

 

 

 

Question 1

.43

.40

.35

.42

Question 2

.47

.46

.38

.47

Question 3

.15

.06

.04

.08

Question 4

.25

.16

.11

.18

Question 5

.18

.02

.07

.09

Total

.36

.28

.23

.30

Professional Responsibility

 

 

 

 

Question 1

.60

.61

.55

.62

Question 2

.64

.50

.58

.60

Question 3

.55

.65

.48

.60

Question 4

.43

.33

.28

.37

Total

.65

.61

.55

.64

Overall Questionnaire

.49

.43

.37

.46

N = 16             Values in read are significant @ p < .05

Investigation of the second criteria of little or no relationship between TEI scores and demographic variables involved correlating classroom demographic variables with teacher effectiveness ratings.  These results are presented in Table 2.

Table 2

Classroom

Demographics

Effectiveness Ratings

Reading

Mathematics

Writing

Overall

% Minority

-.05

-.05

-.05

-.07

% Free/Reduced

-.12

-.10

-.06

-.12

STRESS

-.10

-.08

-.06

-.10>

As can be seen the teacher effectiveness ratings are unrelated to the demographic makeup of the classroom. Since every student serves as his or her own control (growth over time) there is no need for adjusting based on demographic variables.The “stress” indicator above is a combination of percent minority and percent low SES as measured by free/reduced lunch.To further investigate the independence of ethnicity and growth the residuals and the observed scores were compared across ethnic groups (Table 3).

Table 3

Ethnicity

Residuals

Observed Scores

Reading

Math

Writing

Reading

Math

Writing

White/Anglo

.040

.033

.003

.365

.349

.343

African American

-.028

-.039

-.017

-.167

-.247

-.161

Hispanic

-.011

-.027

-.012

-.291

-.245

-.279

Native American

-.035

-.052

-.059

-.473

-.495

-.442

Asian American

.099

.127

.162

.344

.510

.414

All Ethnic groups

.009

-.002

-.004

-.034

-.019

-.033

Data reflects all grade levels across the district

Table 3 demonstrates that while observed scores vary dramatically across ethnic groups the residuals do not. These results show teacher evaluation methods that simply rely on absolute year-end performance simply reward teachers in low stress classrooms and penalize teachers in high stress classrooms without taking into account any gains that may have been made.

In order to address the third criteria of no relation between grade level and effectiveness rating residuals were correlated with grade at the student level. Demonstrating constant relative growth across grade levels is necessary to insure unbiased effectiveness ratings. The results of this analysis can be seen in Table 4.

Table 4

Correlations Between Grade and Residuals
used to calculate the Teacher Effectiveness Index (TEI)

Residual

Grade level

Reading

-.04

Math

-.03

Writing

-.01

Overall

-.03

From these results it can be seen that relative growth does not appear to change over time (grade level). As a result effectiveness rating will be unrelated to grade level.

Our investigations of teacher effectiveness using a 2-level HLM model removed systematic variance related to school and classroom effects resulting in an almost complete removal of systematic variance in the error component (residual). As a result of this over modeling the residuals or TEI’s were not comparable from year to year (i.e., little or no reliability). Reliability is not expected to by high given that teachers are working with a different groups of students every year and teachers that are failing to meet minimum standards are likely to be placed on a plan for improvement. Demonstrating the forth criterion of “reliability of time” is perhaps the most important aspect of a valid teacher effectiveness model. It provides evidence not only of systemic variance in the residual (teacher effectiveness), but also that this systematic variance is consistent over time (reliable). Table 5 shows the results of a reliability analysis of residuals and observed scores at the individual student level. As can be seen the reliability of the observed scores is fairly high ranging from .74 to .84 along the diagonal. On the other hand, the reliability of the residuals is zero along the diagonal. This result shows that a student scoring higher or lower than expected one year has nothing to do with whether the same student will score higher or lower the following year.

Table 5

Correlations Between Student Achievement Scores
and Their Residuals Across Two Years (2003 – 2004)

2003 Results

Reading 04

Math04

Writing04

Reading Residual04

Math Residual04

Writing Residual04

Reading 03

.84

.73

.74

-.02

.09

.10

Math 03

.72

.82

.67

.06

-.01

.08

Writing 03

.79

.73

.74

.05

.08

.01

Reading Residual 03

.01

.01

-.00

.00

.01

.01

Math Residual 03

.01

.03

-.00

.01

-.00

.02

Writing Residual 03

.01

.02

.00

.01

.01

.00

Additionally, table 5 shows no relationship between a student’s absolute level of performance and their associated residuals since every student serves as his or her own control. The aggregation of individual student residuals by classroom forms the TEI. We have demonstrated that the year to year fluctuations of the student residuals is completely random across all students for the entire district. To the extent that the aggregation of these residuals by classroom forms a systematic component indicates that the classroom a student is placed in has an impact on whether he or she will show gains or loses compared to expectation (teacher component). As can be seen in Table 6 this is exactly what was found. Students showing either year to year gains or year to year loses tended to come from the same classrooms.

Table 6

Correlations Between Teacher Effectiveness
Ratings (TEI) Across Two Years (2003 – 2004)

Variable

Reading 2004

Math 2004

Writing 2004

Overall TEI 2004

Reading 2003

.37

.35

.35

.43

Math 2003

.29

.46

.32

.44

Writing 2003

.29

.31

.31

.37

Overall TEI 2003

.36

.43

.37

.47

Note: All correlations were significant @ p < .05

The results of this investigation lead to the conclusion that it is possible to partial out teacher effects on individual student performance modeling relative student growth using ordinary least squares multiple regression. Once more, the systematic effects of teacher were shown to be consistent over time. While this procedure demonstrated valid and reliable results it is based solely on individual student outcomes. A more comprehensive evaluation of teacher performance may require the combination of this method along with a method that is more sensitive to the teaching process. The choice of evaluation methods or the combination of methods and their weighting will be determined based on the needs and emphasis of the institution.

 




Appendix A

Individual’s predicted performance (Y’):

 

Y’ = b1X1 + b2X2 + b3X3 + bnXn + …

 

The values (b1…bn) represent calculated regression weights associated with predictor variables (X1…Xn).  The predictor variable X1 would be a measure of a student’s academic performance in the 1999-2000 school year.  The variable X2 would be a measure of the same student’s performance in the 2000-2001 school year, and so on.  The score being predicted (Y’) is the student’s actual performance for the current school year 2003-2004.

 

A student’s actual observed score (Obs) is compared to the student’s predicted score based on past performance.

 

The difference between individual’s observed and predicted score is defined as a “residual”.

 

(Y’ – Obs) = Residual

 

The residual value becomes the unit (score) of interest.  The residual forms the Teacher Effectiveness Index (TEI).  While the TEI contains random variance it also contains systematic variance produced by variables that were not in the equation (e.g., teacher effects).  By including variables that predict an individual’s performance (i.e., past performance) and excluding variables that are associated with teaching effectiveness the residual reflects the effect of a teacher on student performance.  Since the residual still contains random variance (error) that cannot be further partitioned we must estimate the proportion of the residual that is likely due to random error.

 

We know that the average error in prediction over the entire population (district) is zero.  In other words, the population mean of residuals is zero. 

 

E(Y’ – Obs)1-N/N = 0

 

Where (N) represents the number of all fourth grade students in the district for example.

 

The TEI score for any teacher is the average residual value for individuals in his or her classroom.  The teacher’s effectiveness score (TEI) can be expressed as:

 

E(Y’ – Obs)1-n/n = TEI

 

Here (n) represents the number of students in a given classroom.

 

Since the variance due to teacher effectiveness cannot be partitioned from the variance that we consider error we must make a judgment as to whether the magnitude and the direction of the residual exceeds that of pure chance variation.  To do this we simply calculate the standard error of the mean for any classroom and compare that to the population mean which we know to be zero with a standard deviation of 1. 

 

Thus:

F0  =  F / Ön)    or     1 / Ön

 

z = (0 - :) / F0      or   z = 0 / F0  

 

 

With the standard error of the mean we can test whether the classroom mean is significantly different from the population mean of zero.  A 95% confidence interval can be constructed around the obtained classroom mean.  If the 95% interval includes the population mean we cannot say with confidence that the effectiveness rating differs from the population mean by more than can be accounted for by chance alone.

 

0 - 1.96(F0) < 0 < 0 + 1.96(F0) = .95

 

The equation above represents a 2-tailed 95% confidence interval.  If the calculated upper limit is less than zero we could say with 95% confidence that the mean of the residuals for that classroom is a sample from a population with a mean less than zero.  Conversely, if the calculated lower limit is greater than zero we could say with equal confidence that the “true” population mean is greater than zero.

 




Appendix B

 

2003 Teacher Evaluation

 

The following components address Arizona’s Professional Teacher Standards.  Information you provide in this evaluation will be used in conjunction with other measures to establish Teacher Effectiveness Ratings.

 

 

*            Planning and Preparation

 

Demonstrates knowledge of Arizona State Standards

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

Demonstrates knowledge of core content and pedagogy

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

Develops and prioritizes learning goals and objectives

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

Uses appropriate formal and informal assessment to guide instruction

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

*            Classroom Environment

 

         Establishes and maintains effective classroom management

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

         Creates an environment of mutual respect and rapport

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

Creates a culture that encourages learning

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

*            Instruction

 

Communicates clearly and accurately

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

Engages students in learning

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

Uses differentiated instruction that is developmentally appropriate

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

Uses effective questioning and discussion techniques

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

Provides positive student feedback and reinforcement routinely

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

*            Professional Responsibility

 

         Communicates effectively with families

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

Collaborates with community to support student learning

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

Participates in professional growth opportunities

o        Needs Improvement

o        Meets Expectation

o        Exceeds Expectation

 

Self evaluates and adapts teaching practices

o        Needs Improvement

o        Meets Expectation

Exceeds Expectation

Top of Page

Last updated 2/14/06