Select Committee on Education and Employment Minutes of Evidence


Examination of Witnesses (Questions 180 - 199)

WEDNESDAY 2 DECEMBER 1998

PROFESSOR JOHN MACBEATH, PROFESSOR PETER MORTIMORE AND PROFESSOR HARVEY GOLDSTEIN

  180. The need for more funding, etcetera.
  (Professor Mortimore) Absolutely. I am taking that as read, Chairman! There a number of studies but not very many. There is an emerging study from Huddersfield University, which you probably have had drawn to your attention, by Cullingford, Daniels and Brown. This does relate, in a very hard statistical way, to the impact of OFSTED inspections on proportions getting A to C grades in GCSE. There are my own colleagues at the University of London, who have been involved in looking at OFSTED for some years. There is OFSTED's own reflexive study carried out by Peter Matthews, looking at its own methods of evaluation and its own reliability. There is the study by the Society of Education Officers, led by Philip Hunter, which has followed up schools after inspections. There is the piece that Professor Goldstein has already referred to, where he and I did look at the study of reading in the London boroughs. There are a number of studies and we can draw conclusions from them but at this stage they mostly pose questions. As Professor Goldstein has suggested: questions about methodology, questions about interpretations, questions about how that information is used in the public service.

  181. Professor MacBeath, do you want to add anything at this stage?
  (Professor MacBeath) There is not a lot to add to that except to say that there has been something like 25 years of school effectiveness research and there are some very clear and consistent messages which come out from that research. If you rubbish or dismiss all of that research, as the Chief Inspector has done on public platforms and to say that basically it is all a waste of time, then you undermine everything the system has been trying to achieve over the last ten or 15 years. There is a significant piece of research, conducted in Scotland at the moment with Professor Mortimore and myself, which involves 80 schools. One of the focuses of that research has been to look at the nature of the external consultant, the critical friend, if you like. How do you support schools to become more effective? That has been a very telling piece of research on the role of the internal and the external evaluation.

Charlotte Atkins

  182. It would be useful to know to what extent schools use self-evaluation methods to improve their performance. I know it has been very much dismissed by the Chief Inspector as not being of great use, but I would be interested for you to tell us what features within this self-evaluation framework are used in these schools.
  (Professor MacBeath) It is very difficult to give a picture for England because I do not know whether anyone has that picture, because it has been very much an ad hoc approach here; but we know from various projects in which schools have been involved. For example, there is a European project currently drawing to an end, which is called Evaluating Quality in School Education. This involves 101 schools in Europe, 18 countries, involves seven schools in England and one in Wales. All of the schools in that project have followed a similar methodology. This is to look at the school from the perspectives of the pupils, the parents, the teachers, the governing body and management team; to gather evidence in a systematic way over a period of time, which gives a view of the school from a range of different perspectives internally; plus the external perspective of the person working with the schools, the critical friend. So we do know that from small-scale studies, where schools have taken on self-evaluation, that they have been very keen on it. They have found it enhances the professionalism of the teachers. It improves the quality of management. It improves the quality of teaching and learning but welcomes the external inspection because when the external inspection comes the school is able to say, "We know what we are doing on a day to day basis. This is built in. It is not bolt-on, something extra. It is built into the whole quality of the school, of school life." We are looking to the future and this is being currently being discussed in Scotland, that the role of the inspector might be to phone up from the car park and say, "We are here and we are coming upstairs, because we believe we can drop it at any time and you will be able to show us the quality of the organisation you are running, the quality of the teaching in the classroom." To me that seems to be the model way forward.
  (Professor Mortimore) To add to what John is saying. The school self-evaluation seems to me the real key to improvement. When teachers and all those working in schools really accept that there is a problem and have the resolve to do something about it and to put all their energy into it: this is when things really change. However, this requires there to be confidence: confidence in their own ability and also confidence that they are going to get good advice from outside, but advice which is positively slanted towards helping them change things. If there is fear, a lack of trust, it all goes wrong. Certainly at the 1997 Toronto UNESCO meeting there was a feeling from all the nations represented there that the way forward was through school self evaluation which was for real improvement. It was countries who could boast that things really had changed had used that as their model.

  183. So are you saying that the support for schools to self-evaluate is absolutely vital? That this is the important issue. Clearly no-one would expect schools to self-evaluate without that sort of support framework.
  (Professor Mortimore) I agree entirely. It has to have the support framework. Many people in OFSTED: many inspectors are doing this, are positive, and can provide that impetus. The problem is in those cases which are not. Once it becomes a current currency that there is just fear and mistrust about how the information is being used, then that does the opposite. It actually deskills the teachers rather than giving them the confidence to ask themselves the really critical questions, which they know the answers to better than any inspector on one visit can ever do.

  184. So you think that self-evaluation does, in fact, demonstrate the weaknesses of the school and enables teachers to take on those lessons?
  (Professor Mortimore) In the right framework it can, more than anything. However, it has to be within the right framework.

Caroline Flint

  185. May I ask a supplementary to that. From some of the evidence we have received, many schools already seem to be using the OFSTED framework for self-evaluation of schools. In fact, we have talked quite a lot on the Committee about the OFSTED orthodoxy in schools. Is that not happening as part of the process? What do you think about that?
  (Professor Mortimore) I think that is a very good thing. The publication of the OFSTED Handbook was excellent, making explicit many of the judgments which would be used. That was a very good move and I fully support that. I think it is absolutely right that schools would use that as a template from which to measure themselves. They have information about their own school which no inspector will ever have. If they can be totally honest and know that there will not be calamitous results, then you are going to bring out the information. You begin to unpick the problems and if you have the institutional resolve to do something about it you are on your way to school improvement.

Chairman

  186. Before I bring in Don Foster on our research questions, on self-evaluation, can I challenge you a bit on this. The Annual Report from the Further Education Funding Council's Chief Inspector found that colleges' self-assessment reports provided a useful basis for planning, (so he or see said), but then went on to say colleges "generally understated weaknesses in provision" and that in "28 per cent of cases inspectors considered that colleges had been over-generous in the grades they had awarded themselves." Is there not a real problem here, to put in bluntly, that self-evaluation is a soft option and many of us in a weak moment might choose self-evaluation rather than outside inspection?
  (Professor MacBeath) For the general public, if you talk about self-evaluation people would say, "Oh well, that is obviously pretty soft. The police evaluating themselves. Teachers evaluating themselves. Of course they are going to be pretty generous." There are a number of models of self-evaluation. If a management team evaluates the school or further education college, they will probably give it a fairly rosy report. That is our own experience in the projects we are doing. The management team will tend to over-estimate because they have a fairly limited view of the school. In a school, if you take into account the teachers' views and evidence from pupils as well, then you begin to get a triangulation of different perspectives. We found that including the views of further education students or school pupils—even in primary schools—if these are included in the evaluation you get a far more challenging and far more rigorous view, so that you do not get an opportunity to over-estimate or gloss what is happening. But to go back to Peter's point about the framework within which this occurs, in our experience—and that is a long experience of many projects in different countries—if you create a framework in which people feel secure and trusting in what is happening, they will be very rigorous and self- critical and will, in fact, sometimes be very painfully honest about their own strengths and weaknesses. If you do that in a climate of threat and lack of trust people will obviously put a front on and try to give you a gloss. Perhaps the best example of this is from the European Conference last week. A school from London appeared at the European Union of Teachers, some of whom were quite sceptical about evaluation. They described their recent OFSTED experience and said, "What you do during an OFSTED inspection is to bury the bodies. When you do self-evaluation it helps to find where the bodies are buried." They said it was not a comfortable experience. It was the most challenging experience they had ever been through as a management team and they had had to look again at quality and standards. They felt they were running a very good school but after their own self-evaluation they had to go back to take a closer look. "It was painful, it was hard, but now we think we are running a much better school because we have evidence from the viewpoint of the young people and the teachers and the parents, and this gives a very comprehensive challenging view of the school."

Mr St Aubyn

  187. This sounds all very well for a good school which is well structured and has proper leadership, but one of the reasons for OFSTED was to try and address the problem schools which do not have those qualities. Is there not a weakness in your argument that the very schools OFSTED was set up to cope with and deal with and highlight the problems, are the ones that will not work very well with evaluation?
  (Professor MacBeath) Yes. This brings us back to the notion of light-touch inspection. I would want to say, the core question is "how rigorous and systematic is the school at evaluating itself?" If you have strong self-evaluation then you do need light-touch. That is the key criterion to light-touch inspection. Where you have schools in difficulty, then I think you need much stronger support and intervention. So I would say that this spectrum is very important. That the school which is struggling needs a lot of help. What they do need is a lot of punitive intervention; they need a lot of support, development, and that is where the role of the critical friend is so important.

  188. Therefore, what you are suggesting is a natural development for OFSTED. Clearly in the early years it had to establish which were the failing schools and which were the ones capable of more self-evaluation. Is this not almost a progression from where OFSTED started rather than a criticism from where it did not start?
  (Professor MacBeath) Yes, but I will ask Peter Mortimore to answer this.
  (Professor Mortimore) It is a mixture of the two that you need. In any system there needs to be a balance between the external and the internal. The more confident that people can be inside that they are not going to get clobbered, the more confident they can be in asking themselves the hard questions; and bringing in people, as Profession MacBeath has said: bringing in fellow head teachers, bringing in parents, bringing in fellow teachers from other schools, (rivals, in fact), to look and uncover the problems, then one can begin to create a culture of self-criticism that is right. However, I would also say you do need OFSTED. I am all for inspection. You need a capacity for external inspection and in some emergency cases you need to pull it in very fast. As you say, there are problems when schools are no longer able to help themselves, when things have got too bad. But again, I would say with Professor MacBeath that what they need is a lot of help and support, not necessarily the punitive touch at that stage. It is a fine balance. The skill of orchestrating it all is supremely difficult.

Mr Marsden

  189. Really, on that point, I get the sense from what you are saying that you are almost suggesting a sliding scale approach in terms of what OFSTED does: that maybe one school has 60 per cent self-evaluation and 40 external inspection. I suppose the analogy I would draw was when I was employed as an Open University tutor. A number of my assessments were monitored internally and then, of course, you were moved up and down depending on whether they were happy with the sort of comments you were making on student essays or not. Is that the sort of model you are suggesting for OFSTED?
  (Professor Mortimore) Certainly I would support a mixed model which was an "intelligent model", and where information on all the indicators seemed to be fine I would not waste a lot of public money. The State of Victoria in Australia has a similar scheme, where they begin with a light touch and if they are at all worried they bring in the heavy cavalry to spend time and uncover whatever problems there are. When pressed by English teachers as to why they did not observe in classrooms their representatives said, "We could do that but in order to get reliable observations it will cost a lot of money. Do you want us to do this routinely or do you want us to save the money and use it for the improvement at the other end?" That is what you get with a flexible system.

Mr Foster

  190. If I may suggest, in terms of some of your opening remarks—and in different ways from each of you, both in the written evidence and in your comments today—you have been somewhat critical about the validity and reliability of the data gathered by OFSTED. Yet I am very conscious in the early summer of this year that OFSTED produced a document judging attainment, in which they assessed a degree of conflict between inspectors' judgments about attainment and the exam results and SATS results. They concluded that there was a fairly good correlation between the two, so does that not suggest that there is clear reliability and validity in their data?
  (Professor Goldstein) First of all, the study of the reliability of the inspectors' judgments, where they sent in different inspectors and compared the results, actually indicated not a very high reliability. The crucial grade 4/5 boundary, the failure boundary: a third of those inspectors who allocated grade 4 to the teacher, the other inspector allocated a grade 5 to, and vice versa. So on this crucial point OFSTED's judgments were on their own data and are actually not very reliable. However, that itself was a very limited case.

  191. In that particular piece of research, which is a different one from the one to which I am referring, was it not the case that in the OFSTED Report they describe the correlation as reassuringly high. You seem to be saying they were wrong.
  (Professor Goldstein) There was a reasonably high correlation. There were two things which occurred in that study. One was that there was not very much spread of judgments. The extreme grades were simply not used. That is a problem. The second one is that the crucial issue about making judgments is where you cross a crucial boundary. The 4/5 boundary is a crucial one. So I was simply taking that particular statistic as illustrative of a lack of reliability, which actually matters when you come to judging teachers and when you come to judging schools. The other issue you raised about attainment is a very important one. OFSTED is meant to judge attainment, achievement and progress. The progress judgment is simply on the basis of observing one lesson for 20 minutes and, of course, this is quite inadequate. Yet OFSTED continues to talk about inspectors' judgments of the progress that children are making. It is simply not the case that they can judge progress on the basis of a single inspection. The only possible time they can make a judgment of progress would be to look at children's work several years prior to inspection and make judgments about how it has changed over time. That is only one narrow aspect of the way they make judgments. One of the things following up my earlier response, where OFSTED's inspectors' judgments could be set against other evidence, is that there are a number of local authorities where there is good value added data now on the progress children make in different schools from particular starting points, together with OFSTED judgments about the same schools and the same cohorts of children. It would be quite feasible to do a study to try to validate OFSTED judgments, both in terms of reliability and validity, against that objective information. This is one of the research projects which would be extremely useful to do.

  192. I am grateful to that. I wonder if either of your two colleagues want to comment on that.
  (Professor Mortimore) First of all, may I begin by saying that any evaluation cannot be a precise science. It is simply using someone with judgment observing a particular happening in a school at a particular time. That is fraught with problems. We have got ourselves into a situation where everyone expects these things to be totally reliable and they cannot be. It would be good if we all humbled ourselves by saying that it is an estimate of what is happening at that one time. Having said that, I do not think we can answer Mr Foster because I do not think there has been a sufficiently large independent study of OFSTED judgments in school classrooms. However, I have to say I am very worried about the validity issues, having looked especially at the local authority inspections and the inspections of teacher education, where the validity is highly questionable.

  193. Given that then, are you arguing that some of the judgments that have been made, based on that data, are probably going a bit further than the data warrants?
  (Professor Mortimore) Absolutely.
  (Professor MacBeath) I think that was quite characteristic of at least the Chief Inspector's pronouncements, to use research very carelessly to make judgments, and to make inferences from them and deny people the access to their own data. The validity question, as Professor Mortimore has said, is really one of the key questions. Can you, on the basis of a 15 or 20-minute visit to a classroom in very artificial and fraught circumstances, make a valid judgment about the quality of learning and teaching in that classroom? All our research evidence says: "No, you can't do that." If you want a reliable judgment on the quality of learning and teaching in the classroom, you will get a far better and more reliable judgment from the pupils who are in that classroom day to day and week to week. Teachers' estimates of pupil attainment will be far more reliable than an OFSTED inspector coming in on a very brief visit.

  194. Can we draw anything from those remarks to the fact that on the latest round, 88 per cent of inspectors' judgments were grading attainments at 3 and 4 out of the 7-point scale. That degree of bunching struck me as very large.
  (Professor Goldstein) They are clearly not using the scale as it was meant to be used, so there is presumably some problem with the training of inspectors making those judgments. There is a whole issue about the training and inspectors.

  195. Finally and very briefly because I know we need to move on, Professor MacBeath, you said earlier that the Chief Inspector had rubbished education research overall, based presumably on the Tooley Report. Is it your view that the Tooley Report led to a justification for making those remarks anyway?
  (Professor MacBeath) It was long before the Tooley Report that the Chief Inspector was making that kind of remark—sometimes totally outrageous remarks. When questioned at the public forum about whether he believes that we should be looking more rigorously at the quality of learning, he would say, "I am not particularly interested in learning." That is a totally incredible statement to come from a chief inspector in charge of schools. Then referring to his own college of further education experience which he said had done him no good, stating that basically all we need is good common sense. Common sense as a replacement for research would lead us down a very dangerous path. The Tooley Report played very nicely into that kind of prejudice and undermining of research. There is something to fear from research that is rigorous and looks for evidence and looks for where the bodies are buried. That is perhaps a threatening thing as far as OFSTED is concerned.
  (Professor Goldstein) May I just disagree slightly with my colleague on that one. I do not think the Tooley Report was all that bad. If you read it carefully it is very careful to state its reservations. In fact, it says we should not make generalisations about education research. James Tooley himself has reservations about what the Chief Inspector has said. If you read what the Chief Inspector says in the preface, it simply does not match up with the caveat that was made in the report. There are some problems with the report. I have criticisms. But actually the issue is that the Chief Inspector has not properly understood and taken to heart what the report actually says.

Chairman

  196. May I ask you, the research where two inspectors were asked to make judgments and ended up with this correlation of 0.81. That struck me as rather a good result yet you are sceptical. If any two people are asked to judge a similar situation, there is always going to be a percentage of difference. When we examine university scripts the two internals disagree. Sometimes the external comes in and disagrees. There is a problem of bunching when it comes to the marking of degrees. Are you really saying to us that 0.81 is not reassuringly high?
  (Professor Mortimore) The overall correlation with all the gradings was, as you say, rather high but that includes the easy ones. That includes the agreements that are obviously at the top of the scale and the ones which are obviously at the bottom of the scale. The problem area, as Professor Goldstein said, is where there is that difference between the 4/5. That is a crucial difference, which can make a difference to a teacher's career. This is because if the head is told that they are worth only a low grade, it will have an implication. That is where there was the biggest disagreement.
  (Professor Goldstein) First of all, if everybody always gives a grade 3, there will be a perfect correlation. As Mr Foster pointed out, you have this bunching. In fact, the bunching actually pushes up the correlation artificially. That is why I concentrated on the boundary. It is a much better summary of what is going on. May I make a further quick point. This is only a very narrow definition of reliability. It is a reliability between two inspectors' judgments of the classroom. There is the whole issue of the variation from day to day that the teachers have in terms of different children that confront them and so on. There is a whole dimension to reliability and variability of judgment. The fact is that the inspector goes in to observe one teacher and one group of children and one day is very special. There is a broader definition of reliability. You would see much more unreliability if you did it properly and took it over a much longer period.

Caroline Flint

  197. We have had various comments and evidence to suggest that even though the framework in Wales is exactly the same as in England, that there has not been what seems to be some sort of crisis of confidence in the systems. If you are saying the validity of the work carried out in England by OFSTED is suspect and not reliable, are you saying the same about Wales where they do not seem to have the amount of problems or issues raised against them?
  (Professor Goldstein) I have not looked at the Welsh information side so I do not think I can comment on that.

  198. Do you not think it is interesting though that we have exactly the same system in Wales and in England, but these issues about validity or controversy just do not seem to be coming to a head? It just does not seem to have the same issues you are raising about people questioning it. Why is that?
  (Professor Goldstein) I think my earlier point was not that it is invalid, but that there is not the evidence which we could get to decide how valid it was and how reliable it was. I suspect the difference in Wales and England is as much a political one as an objective educational one.
  (Professor Mortimore) I think, in a sense, you answered the question. It is the culture in which these things are carried out. As I said before, these are snapshots of a particular time at a particular moment. That is a very difficult thing to do. If it is in a culture where there is trust —

  199. So it is the culture and not the framework?
  (Professor Mortimore) I think so, yes, and the trust is absolutely essential to that situation.


 
previous page contents next page

House of Commons home page Parliament home page House of Lords home page search page enquiries

© Parliamentary copyright 1999
Prepared 24 June 1999