Resource Home Documents Home 
September 22, 2004
What is Teacher Effectiveness?
Teaching ability (effectiveness) can be difficult to define and even more difficult to measure directly. Since teacher effectiveness is difficult to measure directly many researchers use statistical methods to remove variance that is considered to be “beyond the teacher’s control”. The rationale being that whatever variance is left must be related to teacher effectiveness. The leftover variance is called a “residual” in statistical terms, and it is used as a Teacher Effectiveness Index (TEI). Having a well delineated operational definition of teacher effectiveness is a crucial first step to measuring or modeling teacher effectiveness. For this investigation teacher effectiveness is defined as the teacher’s ability to maintain or accelerate a student’s relative growth".
What is Relative Growth?
Growth for the purposes of this investigation is defined as "a student’s rank order standing within his or her cohort group over time”. Here the term "cohort" is defined as all students at the same grade level across the district. For a student to maintain their rank order they would need to score at the same percentile on two consecutive years. A student scoring at the same percentile would also demonstrate one year’s growth. The student would be scoring as expected based on past performance. While learning curves tend to be curvilinear over time, relative growth curves tend to be straight (linear) over time. Statistical methods based on the General Linear Model (GLM) produce a good datamodel fit minimizing the average error in prediction.
Best Predictors of Performance
The best predictor of future performance is past performance. While other variables including
demographic variables like SES, mother’s educational level, or ethnicity may predict
performance they are not part of our definition of teacher effectiveness. In fact, it will
later be suggested that removing demographic variance may inadvertently remove a
substantial portion of variance associated with teacher effectiveness. Accurate and
reliable prediction is dependent on the reliability of the predictors. As predictors of
academic performance we use standardized achievement tests that have demonstrated high reliability.
We then maximize reliability by combining all available measures of academic performance for a given
student for every year the student was in the district.^{1}
Academic performance for the
most recent year (dependent measure) was predicted based on a model that
uses multiple measures per year across years.
Residuals and Error Variance
In accordance with our definition of teacher effectiveness the only variable modeled was the student’s growth. That is, the only variance removed was that associated with student performance. The residual or TEI has two components, systematic variance associated with variables that were not included in the model (e.g., teacher effects), and random variance associated with unreliable measures. For the residual or TEI to be reliable, the systematic component of the residual (teacher effect) must be maximized while at the same time minimizing the random component (error). Increasing the systematic variance in the residual was made possible under the current model by using an a priori definition of teacher effectiveness, as opposed to a posteriori definition that defines teacher effectiveness as the residual variance that is left over after removing all the systematic effects on student performance (e.g., variance related to student demographics). The current model minimizes the random variance component by using reliable measures of student performance. In signal detection terms the aim is to maximize the signal to noise ratio in the residual. Often researchers focus on modeling student performance in order to maximize prediction (i.e., large R^{2}). Maximizing R^{2} is appropriate if accurate prediction is the goal. On the other hand, if the residual is to be used as a reliable measure of teacher effectiveness (TEI) then maximizing R^{2} produces an unacceptably high random to systematic variance ratio. Removing all the systematic variance can be referred to as “over modeling”. A more detailed discussion of the residual and the TEI along with their computation can be seen in (Appendix A).
Multiple Regression (MR) vs. Hierarchical Linear Modeling (HLM)
Two and three level HLM models maximize prediction by accounting for additional variance resulting
from the interaction between variables at different hierarchical levels (e.g., school and classroom). Since we are only
interested in one level of analysis (i.e., the classroom)
there was no need for multilevel methods of analysis. Given that there was no need for a multilevel
analysis of the data and maximizing R^{2} was not our ultimate
goal we chose simple multiple regression as our statistical method of analysis.
^{1}^{ }Students were tested using the Stanford 9,
In addition, the twolevel HLM tended to over model by removing classroom effects which were the
effects we were interested in (i.e., it tended to remove all the systemic variance from the
residual). Again, this result would be desirable if prediction were the goal,
however when the systematic portion of the residual is of interest the twolevel
HLM produces residuals with very low yeartoyear reliability.
Evaluating the Model
The utility of a teacher effectiveness model must be determined based on its validity and reliability. In order to demonstrate its validity and reliability it must meet four criteria:
1. Cross validate with other measures of teaching
performance
2. Show little or no relation to demographic variables
3. Show little or no relation to grade level
4. Show reliability over time
To demonstrate cross validation
the teacher effectiveness indices were correlated with teacher evaluation
ratings by principals on an instrument specifically designed for this
purpose. The
results can be seen in Table 1. As can be seen most correlations are moderate
ranging from ~ .30 to ~ .50 with the exception of those correlations under
“Professional Responsibility” most of which were significant (p < .05). While these
correlations are not as high as one might expect they do demonstrate shared
variance between the two measures. The correlations are more acceptable given
that the two methods of evaluation are measuring different facets of
education. The
teacher evaluation focuses more on “process” while teacher effectiveness focuses
more on “outcomes”.
The Teacher Evaluation Questionnaire can be seen in
Appendix B. It should also be pointed out that the correlations were based on a
small sample of teachers. Given equivalent variances a larger sample
would have yielded a larger number of significant coefficients.
Table 1
Cross Validation of the Teacher Effectiveness Index (TEI)
with a Specially Constructed
Teacher Evaluation Questionnaire
Teacher Evaluation 
Teacher Effectiveness Index (TEI)  

Math 
Writing 
Overall  
Planning & Preparation 




Question 1 
.47 
.43 
.42 
.47 
Question 2 
.20 
.20 
.10 
.18 
Question 3 
.29 
.31 
.19 
.28 
Question 4 
.41 
.40 
.31 
.39 
Total 
.38 
.37 
.28 
.36 
Classroom Environment 




Question 1 
.32 
.30 
.23 
.30 
Question 2 
.39 
.24 
.23 
.30 
Question 3 
.39 
.21 
.30 
.31 
Total 
.43 
.30 
.29 
.36 
Instruction 




Question 1 
.43 
.40 
.35 
.42 
Question 2 
.47 
.46 
.38 
.47 
Question 3 
.15 
.06 
.04 
.08 
Question 4 
.25 
.16 
.11 
.18 
Question 5 
.18 
.02 
.07 
.09 
Total 
.36 
.28 
.23 
.30 
Professional Responsibility 




Question 1 
.60 
.61 
.55 
.62 
Question 2 
.64 
.50 
.58 
.60 
Question 3 
.55 
.65 
.48 
.60 
Question 4 
.43 
.33 
.28 
.37 
Total 
.65 
.61 
.55 
.64 
Overall Questionnaire 
.49 
.43 
.37 
.46 
N = 16
Values in read are
significant @ p < .05
Investigation of the second
criteria of little or no relationship between TEI scores and demographic
variables involved correlating classroom demographic variables with teacher
effectiveness ratings.
These results are presented in Table 2.
Table 2
Classroom Demographics 
Effectiveness Ratings  

Mathematics 
Writing 
Overall  
%
Minority 
.05 
.05 
.05 
.07 
%
Free/Reduced 
.12 
.10 
.06 
.12 
STRESS 
.10 
.08 
.06 
.10> 
As can be seen the teacher effectiveness ratings are unrelated to the demographic makeup of the classroom. Since every student serves as his or her own control (growth over time) there is no need for adjusting based on demographic variables.The “stress” indicator above is a combination of percent minority and percent low SES as measured by free/reduced lunch.To further investigate the independence of ethnicity and growth the residuals and the observed scores were compared across ethnic groups (Table 3).
Table 3
Ethnicity 
Residuals 
Observed Scores  

Math 
Writing 

Math 
Writing  
White/Anglo 
.040 
.033 
.003 
.365 
.349 
.343 
African
American 
.028 
.039 
.017 
.167 
.247 
.161 
Hispanic 
.011 
.027 
.012 
.291 
.245 
.279 
Native
American 
.035 
.052 
.059 
.473 
.495 
.442 
Asian
American 
.099 
.127 
.162 
.344 
.510 
.414 
All Ethnic
groups 
.009 
.002 
.004 
.034 
.019 
.033 
Data reflects all grade levels across the district
Table 3 demonstrates that while observed scores vary dramatically across ethnic groups the residuals do not. These results show teacher evaluation methods that simply rely on absolute yearend performance simply reward teachers in low stress classrooms and penalize teachers in high stress classrooms without taking into account any gains that may have been made.
In order to address the third criteria of no relation between grade level and effectiveness rating residuals were correlated with grade at the student level. Demonstrating constant relative growth across grade levels is necessary to insure unbiased effectiveness ratings. The results of this analysis can be seen in Table 4.
Table 4
Correlations Between Grade and Residuals
used to calculate the Teacher Effectiveness Index (TEI)
Residual 
Grade level 

.04 
Math 
.03 
Writing 
.01 
Overall 
.03 
From these results it can be seen that relative growth does not appear to change over time (grade level). As a result effectiveness rating will be unrelated to grade level.
Our investigations of teacher effectiveness using a 2level HLM model removed systematic variance related to school and classroom effects resulting in an almost complete removal of systematic variance in the error component (residual). As a result of this over modeling the residuals or TEI’s were not comparable from year to year (i.e., little or no reliability). Reliability is not expected to by high given that teachers are working with a different groups of students every year and teachers that are failing to meet minimum standards are likely to be placed on a plan for improvement. Demonstrating the forth criterion of “reliability of time” is perhaps the most important aspect of a valid teacher effectiveness model. It provides evidence not only of systemic variance in the residual (teacher effectiveness), but also that this systematic variance is consistent over time (reliable). Table 5 shows the results of a reliability analysis of residuals and observed scores at the individual student level. As can be seen the reliability of the observed scores is fairly high ranging from .74 to .84 along the diagonal. On the other hand, the reliability of the residuals is zero along the diagonal. This result shows that a student scoring higher or lower than expected one year has nothing to do with whether the same student will score higher or lower the following year.
Table 5
Correlations Between Student Achievement Scores
and Their Residuals Across Two
Years (2003 – 2004)
2003 Results 

Math04 
Writing04 
Reading Residual04 
Math Residual04 
Writing Residual04 

.84 
.73 
.74 
.02 
.09 
.10 
Math
03 
.72 
.82 
.67 
.06 
.01 
.08 
Writing
03 
.79 
.73 
.74 
.05 
.08 
.01 

.01 
.01 
.00 
.00 
.01 
.01 
Math
Residual 03 
.01 
.03 
.00 
.01 
.00 
.02 
Writing
Residual 03 
.01 
.02 
.00 
.01 
.01 
.00 
Additionally, table 5 shows no relationship between a student’s absolute level of performance and their associated residuals since every student serves as his or her own control. The aggregation of individual student residuals by classroom forms the TEI. We have demonstrated that the year to year fluctuations of the student residuals is completely random across all students for the entire district. To the extent that the aggregation of these residuals by classroom forms a systematic component indicates that the classroom a student is placed in has an impact on whether he or she will show gains or loses compared to expectation (teacher component). As can be seen in Table 6 this is exactly what was found. Students showing either year to year gains or year to year loses tended to come from the same classrooms.
Table 6
Correlations Between Teacher Effectiveness
Ratings (TEI) Across Two Years
(2003 – 2004)
Variable 

Math 2004 
Writing 2004 
Overall TEI 2004 

.37 
.35 
.35 
.43 
Math 2003 
.29 
.46 
.32 
.44 
Writing 2003 
.29 
.31 
.31 
.37 
Overall TEI 2003 
.36 
.43 
.37 
.47 
Note: All correlations were significant @ p < .05
The results of this investigation lead to the conclusion that it is possible to partial out teacher effects on individual student performance modeling relative student growth using ordinary least squares multiple regression. Once more, the systematic effects of teacher were shown to be consistent over time. While this procedure demonstrated valid and reliable results it is based solely on individual student outcomes. A more comprehensive evaluation of teacher performance may require the combination of this method along with a method that is more sensitive to the teaching process. The choice of evaluation methods or the combination of methods and their weighting will be determined based on the needs and emphasis of the institution.
^{ }
^{}
Individual’s predicted performance (Y’):
Y’ = b_{1}X_{1} +
b_{2}X_{2} +
b_{3}X_{3} +
b_{n}X_{n} + …
The values (b_{1}…b_{n}) represent calculated regression weights associated with
predictor variables (X_{1}…X_{n}).
The predictor variable X_{1} would be a measure of a student’s academic performance in
the 19992000 school year. The variable X_{2} would be a measure of the same student’s performance in the
20002001 school year, and so on. The score being predicted (Y’) is the
student’s actual performance for the current school year 20032004.
A student’s actual observed score (Obs) is compared to the
student’s predicted score based on past performance.
The difference between individual’s observed and predicted
score is defined as a “residual”.
(Y’ – Obs) = Residual
The residual value becomes the
unit (score) of interest. The residual forms the Teacher Effectiveness
Index (TEI).
While the TEI contains random variance it also contains systematic
variance produced by variables that were not in the equation (e.g., teacher
effects). By
including variables that predict an individual’s performance (i.e., past
performance) and excluding variables that are associated with teaching
effectiveness the residual reflects the effect of a teacher on student
performance.
Since the residual still contains random variance (error) that cannot be
further partitioned we must estimate the proportion of the residual that is
likely due to random error.
We know that the average error in
prediction over the entire population (district) is zero. In other words, the
population mean of residuals is zero.
E(Y’ – Obs)_{1N}/N = 0
Where (N) represents the number of all fourth grade students
in the district for example.
The TEI score for any teacher is
the average residual value for individuals in his or her classroom. The teacher’s
effectiveness score (TEI) can be expressed as:
E(Y’ – Obs)_{1n}/n = TEI
Here (n) represents the number of students in a given
classroom.
Since the variance due to teacher effectiveness cannot be
partitioned from the variance that we consider error we must make a judgment as
to whether the magnitude and the direction of the residual exceeds that of pure
chance variation.
To do this we simply calculate the standard error of the mean for any
classroom and compare that to the population mean which we know to be zero with
a standard deviation of 1.
Thus:
F_{0}_{
}=^{ }F / Ön) or 1 /
Ön
z = (0  :) / F_{0}_{ }or z =
0 / F_{0}_{ }
With the standard error of the
mean we can test whether the classroom mean is significantly different from the
population mean of zero. A 95% confidence interval can be constructed
around the obtained classroom mean. If the 95% interval includes the population
mean we cannot say with confidence that the effectiveness rating differs from
the population mean by more than can be accounted for by chance alone.
0  1.96(F_{0}) < 0 <
0 + 1.96(F_{0}) = .95
The equation above represents a 2tailed 95% confidence
interval. If
the calculated upper limit is less than zero we could say with 95% confidence
that the mean of the residuals for that classroom is a sample from a population
with a mean less than zero. Conversely, if the calculated lower limit is
greater than zero we could say with equal confidence that the “true” population
mean is greater than zero.
2003 Teacher Evaluation
The following components address
Arizona’s Professional
Teacher Standards.
Information you provide in this evaluation will be used in
conjunction with other measures to establish Teacher Effectiveness Ratings.
Planning and Preparation
Demonstrates knowledge of Arizona State Standards
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Demonstrates knowledge of core content and pedagogy
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Develops and prioritizes learning goals and objectives
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Uses
appropriate formal and informal assessment to guide instruction
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Classroom Environment
Establishes and maintains effective classroom management
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Creates an environment of mutual respect and rapport
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Creates
a culture that encourages learning
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Instruction
Communicates clearly and accurately
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Engages
students in learning
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Uses
differentiated instruction that is developmentally appropriate
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Uses
effective questioning and discussion techniques
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Provides positive student feedback and reinforcement
routinely
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Professional Responsibility
Communicates effectively with families
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Collaborates with community to support student learning
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Participates in professional growth opportunities
o
Needs Improvement
o
Meets Expectation
o
Exceeds Expectation
Self
evaluates and adapts teaching practices
o
Needs Improvement
o
Meets Expectation
Exceeds Expectation