Examination of Witnesses (Questions 180
- 199)
WEDNESDAY 2 DECEMBER 1998
PROFESSOR JOHN
MACBEATH,
PROFESSOR PETER
MORTIMORE AND
PROFESSOR HARVEY
GOLDSTEIN
180. The need for more funding, etcetera.
(Professor Mortimore) Absolutely. I am taking that
as read, Chairman! There a number of studies but not very many.
There is an emerging study from Huddersfield University, which
you probably have had drawn to your attention, by Cullingford,
Daniels and Brown. This does relate, in a very hard statistical
way, to the impact of OFSTED inspections on proportions getting
A to C grades in GCSE. There are my own colleagues at the University
of London, who have been involved in looking at OFSTED for some
years. There is OFSTED's own reflexive study carried out by Peter
Matthews, looking at its own methods of evaluation and its own
reliability. There is the study by the Society of Education Officers,
led by Philip Hunter, which has followed up schools after inspections.
There is the piece that Professor Goldstein has already referred
to, where he and I did look at the study of reading in the London
boroughs. There are a number of studies and we can draw conclusions
from them but at this stage they mostly pose questions. As Professor
Goldstein has suggested: questions about methodology, questions
about interpretations, questions about how that information is
used in the public service.
181. Professor MacBeath, do you want to add
anything at this stage?
(Professor MacBeath) There is not a lot to add to
that except to say that there has been something like 25 years
of school effectiveness research and there are some very clear
and consistent messages which come out from that research. If
you rubbish or dismiss all of that research, as the Chief Inspector
has done on public platforms and to say that basically it is all
a waste of time, then you undermine everything the system has
been trying to achieve over the last ten or 15 years. There is
a significant piece of research, conducted in Scotland at the
moment with Professor Mortimore and myself, which involves 80
schools. One of the focuses of that research has been to look
at the nature of the external consultant, the critical friend,
if you like. How do you support schools to become more effective?
That has been a very telling piece of research on the role of
the internal and the external evaluation.
Charlotte Atkins
182. It would be useful to know to what extent
schools use self-evaluation methods to improve their performance.
I know it has been very much dismissed by the Chief Inspector
as not being of great use, but I would be interested for you to
tell us what features within this self-evaluation framework are
used in these schools.
(Professor MacBeath) It is very difficult to give
a picture for England because I do not know whether anyone has
that picture, because it has been very much an ad hoc approach
here; but we know from various projects in which schools have
been involved. For example, there is a European project currently
drawing to an end, which is called Evaluating Quality in School
Education. This involves 101 schools in Europe, 18 countries,
involves seven schools in England and one in Wales. All of the
schools in that project have followed a similar methodology. This
is to look at the school from the perspectives of the pupils,
the parents, the teachers, the governing body and management team;
to gather evidence in a systematic way over a period of time,
which gives a view of the school from a range of different perspectives
internally; plus the external perspective of the person working
with the schools, the critical friend. So we do know that from
small-scale studies, where schools have taken on self-evaluation,
that they have been very keen on it. They have found it enhances
the professionalism of the teachers. It improves the quality of
management. It improves the quality of teaching and learning but
welcomes the external inspection because when the external inspection
comes the school is able to say, "We know what we are doing
on a day to day basis. This is built in. It is not bolt-on, something
extra. It is built into the whole quality of the school, of school
life." We are looking to the future and this is being currently
being discussed in Scotland, that the role of the inspector might
be to phone up from the car park and say, "We are here and
we are coming upstairs, because we believe we can drop it at any
time and you will be able to show us the quality of the organisation
you are running, the quality of the teaching in the classroom."
To me that seems to be the model way forward.
(Professor Mortimore) To add to what John is saying.
The school self-evaluation seems to me the real key to improvement.
When teachers and all those working in schools really accept that
there is a problem and have the resolve to do something about
it and to put all their energy into it: this is when things really
change. However, this requires there to be confidence: confidence
in their own ability and also confidence that they are going to
get good advice from outside, but advice which is positively slanted
towards helping them change things. If there is fear, a lack of
trust, it all goes wrong. Certainly at the 1997 Toronto UNESCO
meeting there was a feeling from all the nations represented there
that the way forward was through school self evaluation which
was for real improvement. It was countries who could boast that
things really had changed had used that as their model.
183. So are you saying that the support for
schools to self-evaluate is absolutely vital? That this is the
important issue. Clearly no-one would expect schools to self-evaluate
without that sort of support framework.
(Professor Mortimore) I agree entirely. It has to
have the support framework. Many people in OFSTED: many inspectors
are doing this, are positive, and can provide that impetus. The
problem is in those cases which are not. Once it becomes a current
currency that there is just fear and mistrust about how the information
is being used, then that does the opposite. It actually deskills
the teachers rather than giving them the confidence to ask themselves
the really critical questions, which they know the answers to
better than any inspector on one visit can ever do.
184. So you think that self-evaluation does,
in fact, demonstrate the weaknesses of the school and enables
teachers to take on those lessons?
(Professor Mortimore) In the right framework it can,
more than anything. However, it has to be within the right framework.
Caroline Flint
185. May I ask a supplementary to that. From
some of the evidence we have received, many schools already seem
to be using the OFSTED framework for self-evaluation of schools.
In fact, we have talked quite a lot on the Committee about the
OFSTED orthodoxy in schools. Is that not happening as part of
the process? What do you think about that?
(Professor Mortimore) I think that is a very good
thing. The publication of the OFSTED Handbook was excellent, making
explicit many of the judgments which would be used. That was a
very good move and I fully support that. I think it is absolutely
right that schools would use that as a template from which to
measure themselves. They have information about their own school
which no inspector will ever have. If they can be totally honest
and know that there will not be calamitous results, then you are
going to bring out the information. You begin to unpick the problems
and if you have the institutional resolve to do something about
it you are on your way to school improvement.
Chairman
186. Before I bring in Don Foster on our research
questions, on self-evaluation, can I challenge you a bit on this.
The Annual Report from the Further Education Funding Council's
Chief Inspector found that colleges' self-assessment reports provided
a useful basis for planning, (so he or see said), but then went
on to say colleges "generally understated weaknesses in provision"
and that in "28 per cent of cases inspectors considered that
colleges had been over-generous in the grades they had awarded
themselves." Is there not a real problem here, to put in
bluntly, that self-evaluation is a soft option and many of us
in a weak moment might choose self-evaluation rather than outside
inspection?
(Professor MacBeath) For the general public, if you
talk about self-evaluation people would say, "Oh well, that
is obviously pretty soft. The police evaluating themselves. Teachers
evaluating themselves. Of course they are going to be pretty generous."
There are a number of models of self-evaluation. If a management
team evaluates the school or further education college, they will
probably give it a fairly rosy report. That is our own experience
in the projects we are doing. The management team will tend to
over-estimate because they have a fairly limited view of the school.
In a school, if you take into account the teachers' views and
evidence from pupils as well, then you begin to get a triangulation
of different perspectives. We found that including the views of
further education students or school pupilseven in primary
schoolsif these are included in the evaluation you get
a far more challenging and far more rigorous view, so that you
do not get an opportunity to over-estimate or gloss what is happening.
But to go back to Peter's point about the framework within which
this occurs, in our experienceand that is a long experience
of many projects in different countriesif you create a
framework in which people feel secure and trusting in what is
happening, they will be very rigorous and self- critical and will,
in fact, sometimes be very painfully honest about their own strengths
and weaknesses. If you do that in a climate of threat and lack
of trust people will obviously put a front on and try to give
you a gloss. Perhaps the best example of this is from the European
Conference last week. A school from London appeared at the European
Union of Teachers, some of whom were quite sceptical about evaluation.
They described their recent OFSTED experience and said, "What
you do during an OFSTED inspection is to bury the bodies. When
you do self-evaluation it helps to find where the bodies are buried."
They said it was not a comfortable experience. It was the most
challenging experience they had ever been through as a management
team and they had had to look again at quality and standards.
They felt they were running a very good school but after their
own self-evaluation they had to go back to take a closer look.
"It was painful, it was hard, but now we think we are running
a much better school because we have evidence from the viewpoint
of the young people and the teachers and the parents, and this
gives a very comprehensive challenging view of the school."
Mr St Aubyn
187. This sounds all very well for a good school
which is well structured and has proper leadership, but one of
the reasons for OFSTED was to try and address the problem schools
which do not have those qualities. Is there not a weakness in
your argument that the very schools OFSTED was set up to cope
with and deal with and highlight the problems, are the ones that
will not work very well with evaluation?
(Professor MacBeath) Yes. This brings us back to the
notion of light-touch inspection. I would want to say, the core
question is "how rigorous and systematic is the school at
evaluating itself?" If you have strong self-evaluation then
you do need light-touch. That is the key criterion to light-touch
inspection. Where you have schools in difficulty, then I think
you need much stronger support and intervention. So I would say
that this spectrum is very important. That the school which is
struggling needs a lot of help. What they do need is a lot of
punitive intervention; they need a lot of support, development,
and that is where the role of the critical friend is so important.
188. Therefore, what you are suggesting is a
natural development for OFSTED. Clearly in the early years it
had to establish which were the failing schools and which were
the ones capable of more self-evaluation. Is this not almost a
progression from where OFSTED started rather than a criticism
from where it did not start?
(Professor MacBeath) Yes, but I will ask Peter Mortimore
to answer this.
(Professor Mortimore) It is a mixture of the two that
you need. In any system there needs to be a balance between the
external and the internal. The more confident that people can
be inside that they are not going to get clobbered, the more confident
they can be in asking themselves the hard questions; and bringing
in people, as Profession MacBeath has said: bringing in fellow
head teachers, bringing in parents, bringing in fellow teachers
from other schools, (rivals, in fact), to look and uncover the
problems, then one can begin to create a culture of self-criticism
that is right. However, I would also say you do need OFSTED. I
am all for inspection. You need a capacity for external inspection
and in some emergency cases you need to pull it in very fast.
As you say, there are problems when schools are no longer able
to help themselves, when things have got too bad. But again, I
would say with Professor MacBeath that what they need is a lot
of help and support, not necessarily the punitive touch at that
stage. It is a fine balance. The skill of orchestrating it all
is supremely difficult.
Mr Marsden
189. Really, on that point, I get the sense
from what you are saying that you are almost suggesting a sliding
scale approach in terms of what OFSTED does: that maybe one school
has 60 per cent self-evaluation and 40 external inspection. I
suppose the analogy I would draw was when I was employed as an
Open University tutor. A number of my assessments were monitored
internally and then, of course, you were moved up and down depending
on whether they were happy with the sort of comments you were
making on student essays or not. Is that the sort of model you
are suggesting for OFSTED?
(Professor Mortimore) Certainly I would support a
mixed model which was an "intelligent model", and where
information on all the indicators seemed to be fine I would not
waste a lot of public money. The State of Victoria in Australia
has a similar scheme, where they begin with a light touch and
if they are at all worried they bring in the heavy cavalry to
spend time and uncover whatever problems there are. When pressed
by English teachers as to why they did not observe in classrooms
their representatives said, "We could do that but in order
to get reliable observations it will cost a lot of money. Do you
want us to do this routinely or do you want us to save the money
and use it for the improvement at the other end?" That is
what you get with a flexible system.
Mr Foster
190. If I may suggest, in terms of some of your
opening remarksand in different ways from each of you,
both in the written evidence and in your comments todayyou
have been somewhat critical about the validity and reliability
of the data gathered by OFSTED. Yet I am very conscious in the
early summer of this year that OFSTED produced a document judging
attainment, in which they assessed a degree of conflict between
inspectors' judgments about attainment and the exam results and
SATS results. They concluded that there was a fairly good correlation
between the two, so does that not suggest that there is clear
reliability and validity in their data?
(Professor Goldstein) First of all, the study of the
reliability of the inspectors' judgments, where they sent in different
inspectors and compared the results, actually indicated not a
very high reliability. The crucial grade 4/5 boundary, the failure
boundary: a third of those inspectors who allocated grade 4 to
the teacher, the other inspector allocated a grade 5 to, and vice
versa. So on this crucial point OFSTED's judgments were on
their own data and are actually not very reliable. However, that
itself was a very limited case.
191. In that particular piece of research, which
is a different one from the one to which I am referring, was it
not the case that in the OFSTED Report they describe the correlation
as reassuringly high. You seem to be saying they were wrong.
(Professor Goldstein) There was a reasonably high
correlation. There were two things which occurred in that study.
One was that there was not very much spread of judgments. The
extreme grades were simply not used. That is a problem. The second
one is that the crucial issue about making judgments is where
you cross a crucial boundary. The 4/5 boundary is a crucial one.
So I was simply taking that particular statistic as illustrative
of a lack of reliability, which actually matters when you come
to judging teachers and when you come to judging schools. The
other issue you raised about attainment is a very important one.
OFSTED is meant to judge attainment, achievement and progress.
The progress judgment is simply on the basis of observing one
lesson for 20 minutes and, of course, this is quite inadequate.
Yet OFSTED continues to talk about inspectors' judgments of the
progress that children are making. It is simply not the case that
they can judge progress on the basis of a single inspection. The
only possible time they can make a judgment of progress would
be to look at children's work several years prior to inspection
and make judgments about how it has changed over time. That is
only one narrow aspect of the way they make judgments. One of
the things following up my earlier response, where OFSTED's inspectors'
judgments could be set against other evidence, is that there are
a number of local authorities where there is good value added
data now on the progress children make in different schools from
particular starting points, together with OFSTED judgments about
the same schools and the same cohorts of children. It would be
quite feasible to do a study to try to validate OFSTED judgments,
both in terms of reliability and validity, against that objective
information. This is one of the research projects which would
be extremely useful to do.
192. I am grateful to that. I wonder if either
of your two colleagues want to comment on that.
(Professor Mortimore) First of all, may I begin by
saying that any evaluation cannot be a precise science. It is
simply using someone with judgment observing a particular happening
in a school at a particular time. That is fraught with problems.
We have got ourselves into a situation where everyone expects
these things to be totally reliable and they cannot be. It would
be good if we all humbled ourselves by saying that it is an estimate
of what is happening at that one time. Having said that, I do
not think we can answer Mr Foster because I do not think there
has been a sufficiently large independent study of OFSTED judgments
in school classrooms. However, I have to say I am very worried
about the validity issues, having looked especially at the local
authority inspections and the inspections of teacher education,
where the validity is highly questionable.
193. Given that then, are you arguing that some
of the judgments that have been made, based on that data, are
probably going a bit further than the data warrants?
(Professor Mortimore) Absolutely.
(Professor MacBeath) I think that was quite characteristic
of at least the Chief Inspector's pronouncements, to use research
very carelessly to make judgments, and to make inferences from
them and deny people the access to their own data. The validity
question, as Professor Mortimore has said, is really one of the
key questions. Can you, on the basis of a 15 or 20-minute visit
to a classroom in very artificial and fraught circumstances, make
a valid judgment about the quality of learning and teaching in
that classroom? All our research evidence says: "No, you
can't do that." If you want a reliable judgment on the quality
of learning and teaching in the classroom, you will get a far
better and more reliable judgment from the pupils who are in that
classroom day to day and week to week. Teachers' estimates of
pupil attainment will be far more reliable than an OFSTED inspector
coming in on a very brief visit.
194. Can we draw anything from those remarks
to the fact that on the latest round, 88 per cent of inspectors'
judgments were grading attainments at 3 and 4 out of the 7-point
scale. That degree of bunching struck me as very large.
(Professor Goldstein) They are clearly not using the
scale as it was meant to be used, so there is presumably some
problem with the training of inspectors making those judgments.
There is a whole issue about the training and inspectors.
195. Finally and very briefly because I know
we need to move on, Professor MacBeath, you said earlier that
the Chief Inspector had rubbished education research overall,
based presumably on the Tooley Report. Is it your view that the
Tooley Report led to a justification for making those remarks
anyway?
(Professor MacBeath) It was long before the Tooley
Report that the Chief Inspector was making that kind of remarksometimes
totally outrageous remarks. When questioned at the public forum
about whether he believes that we should be looking more rigorously
at the quality of learning, he would say, "I am not particularly
interested in learning." That is a totally incredible statement
to come from a chief inspector in charge of schools. Then referring
to his own college of further education experience which he said
had done him no good, stating that basically all we need is good
common sense. Common sense as a replacement for research would
lead us down a very dangerous path. The Tooley Report played very
nicely into that kind of prejudice and undermining of research.
There is something to fear from research that is rigorous and
looks for evidence and looks for where the bodies are buried.
That is perhaps a threatening thing as far as OFSTED is concerned.
(Professor Goldstein) May I just disagree slightly
with my colleague on that one. I do not think the Tooley Report
was all that bad. If you read it carefully it is very careful
to state its reservations. In fact, it says we should not make
generalisations about education research. James Tooley himself
has reservations about what the Chief Inspector has said. If you
read what the Chief Inspector says in the preface, it simply does
not match up with the caveat that was made in the report. There
are some problems with the report. I have criticisms. But actually
the issue is that the Chief Inspector has not properly understood
and taken to heart what the report actually says.
Chairman
196. May I ask you, the research where two inspectors
were asked to make judgments and ended up with this correlation
of 0.81. That struck me as rather a good result yet you are sceptical.
If any two people are asked to judge a similar situation, there
is always going to be a percentage of difference. When we examine
university scripts the two internals disagree. Sometimes the external
comes in and disagrees. There is a problem of bunching when it
comes to the marking of degrees. Are you really saying to us that
0.81 is not reassuringly high?
(Professor Mortimore) The overall correlation with
all the gradings was, as you say, rather high but that includes
the easy ones. That includes the agreements that are obviously
at the top of the scale and the ones which are obviously at the
bottom of the scale. The problem area, as Professor Goldstein
said, is where there is that difference between the 4/5. That
is a crucial difference, which can make a difference to a teacher's
career. This is because if the head is told that they are worth
only a low grade, it will have an implication. That is where there
was the biggest disagreement.
(Professor Goldstein) First of all, if everybody always
gives a grade 3, there will be a perfect correlation. As Mr Foster
pointed out, you have this bunching. In fact, the bunching actually
pushes up the correlation artificially. That is why I concentrated
on the boundary. It is a much better summary of what is going
on. May I make a further quick point. This is only a very narrow
definition of reliability. It is a reliability between two inspectors'
judgments of the classroom. There is the whole issue of the variation
from day to day that the teachers have in terms of different children
that confront them and so on. There is a whole dimension to reliability
and variability of judgment. The fact is that the inspector goes
in to observe one teacher and one group of children and one day
is very special. There is a broader definition of reliability.
You would see much more unreliability if you did it properly and
took it over a much longer period.
Caroline Flint
197. We have had various comments and evidence
to suggest that even though the framework in Wales is exactly
the same as in England, that there has not been what seems to
be some sort of crisis of confidence in the systems. If you are
saying the validity of the work carried out in England by OFSTED
is suspect and not reliable, are you saying the same about Wales
where they do not seem to have the amount of problems or issues
raised against them?
(Professor Goldstein) I have not looked at the Welsh
information side so I do not think I can comment on that.
198. Do you not think it is interesting though
that we have exactly the same system in Wales and in England,
but these issues about validity or controversy just do not seem
to be coming to a head? It just does not seem to have the same
issues you are raising about people questioning it. Why is that?
(Professor Goldstein) I think my earlier point was
not that it is invalid, but that there is not the evidence which
we could get to decide how valid it was and how reliable it was.
I suspect the difference in Wales and England is as much a political
one as an objective educational one.
(Professor Mortimore) I think, in a sense, you answered
the question. It is the culture in which these things are carried
out. As I said before, these are snapshots of a particular time
at a particular moment. That is a very difficult thing to do.
If it is in a culture where there is trust
199. So it is the culture and not the framework?
(Professor Mortimore) I think so, yes, and the trust
is absolutely essential to that situation.
|