Judge allocation and evaluation

Allocating and evaluating the performances of judges is perhaps the most controversial area of the Chief Adjudicator’s responsibilities, so it’s worth being clear about the issues.

Firstly, most judges want to judge most of the time; apart from an occasional opportunity to watch their own team, it is the reason they have attended the tournament.

Secondly, most judges think they are good judges – the inevitable subjectivity of judging means that judges should believe in the marks and votes they award, as they are reflections of the impression a team has made on them. So, generally speaking, judges don’t like being dropped.

As a CA, though, your over-riding responsibility is to pick the best panels you can for the World Championships. Luckily we have had so many more experienced judges in recent years that ‘outrageous panel decisions’ are increasingly uncommon – although votes and marks may stray a little.

Overview
To pick the best panels, you have to go on the impressions you yourself make of judges – which may be fed by analysing their ballots, discussing debates with them, or anecdotal evidence from coaches or fellow judges. Ultimately you will have more information available to you than anyone else, and are in the best position to evaluate judges.

This means that others may be surprised by your opinions. Some judges may be held in high regard due to professional status, confidence, age or number of tournaments attended – and are excellent judges. Others may be held in the same general high regard, but you may notice frequent strange marking patterns compared to their fellow judges.

It is your job as CA, assisted by your CAP, to set aside the subjective opinions of others and evaluate judges as fairly and objectively as you can, using the information available to you. The World Council has elected you as CA and trusts you to do the job.

Rules 13 and 14 in particular give you as CA powers to accept judges and to assess them, including during the Championships for which you have been appointed CA. Rule 15 summarises the CA’s role during the Championships and Rule 16A outlines the steps you are able to take if someone complains to you about a judge.

At the same time, we respect each other’s sensitivities and nobody wants to offend people outright. A certain part of tactful person-management is part of the job too, and it may require you to have difficult conversations with some people. Hopefully, as the theory and techniques of judge allocation are written down and published, the community of judges will grow to trust CAs and CAPs in their decisions.

The theory
The theory of judge allocation is simple. In any given round you have a certain number of debates and certain number of judges. In an ideal world, you would trust all those judges equally and be happy to assign them randomly to panels. In practice, you trust some of them more than others, and must spread them as best you can. As wins are more important than votes to teams (though both are important), you want to have at least a majority of judges on any given panel who you would trust to cast their vote according to a proper understanding of the Rules.

In the past, the prevailing view was that some debates on paper are clearly more likely to be difficult to adjudicate. With all respect to the teams, it will be easier to cite some examples. A debate between Australia and Germany is more likely (though not certain) to be easier to call than a debate between Australia and Scotland, or – just as importantly – one between Germany and the Netherlands. The debates between teams which are ‘close’ in terms of experience are the most difficult to adjudicate, and this is where your best panels should go – irrespective of how 'glamorous' the debate might seem.

At present, however, the practice is to view every preliminary round debate as an even match, and to assign judges regardless of how close the debates are likely to be. Instead, judges are divided into categories (A, B, and C), and every debate is assigned 1 judge from each category (with the A judges assuming the responsibilities of the chair for the debate).

Again, in an ideal world you would put three excellent judges in every room, but you don’t have that luxury – compromise (and a certain degree of moaning from coaches) is inevitable.

Technical bit: Integration with the tab
Judge allocation was sometimes done manually and sometimes in a computer system separate to the one which tabulates the results. However, there were advantages to using the same system to allocate and tabulate.

If a spreadsheet is used to assign judges to debates, it can then generate a form for each debate which has the judges’ names, ready to accept the points they assign on their ballots.

It can also then generate alerts if judges award marks at a great disparity to their fellow panellists, and analyse judges in terms of average mark awarded etc. This isn’t an exact science but may throw up some useful conclusions.

The same tab can also generate printable sheets to show judges and teams assigned to a school on any given day, and produce the results in printable or online form afterwards.

Ultimately this tab becomes the complete record of the tournament – who met who in which debate, who judged them, and what was the outcome.

Balanced panels
There are a number of methods and factors to consider when selecting your judging panel. ''Note that much of this is only applicable to break rounds, with the advent of randomnized judge assignemnts during the earlier stages of the tournament (except, of course, the part on grading judges). ''

Grading judges
One system works like this: each judge is given a letter grade, say A – D, at the beginning of the tournament.

"A judges” are those whom the CA knows to be able, experienced and reliable. They have the reputation of generally giving sound, credible adjudications and being aware of the rules. They have been to Worlds at least twice before or in some cases have performed sufficiently well at one previous Worlds to be considered A grade. They are experienced at delivering adjudication speeches. They have the respect of the WSDC community, are able to act as chairpeople of adjudication panels and can be WSDC flag carriers in the debate rooms. Usually, the CA will know them and be able to pick them out of the list or, if uncertain, can refer them to the CAP who will be able to confirm whether or not they are “A” grade. Ideally, there should be at least one "A" judge per debate, often the chair of the panel, and of course more than one in each debate in the elimination rounds where possible. “B judges” are those who are not so able, experienced or reliable as the A-graders but still sound or solid judges. They will probably have been to at least one Worlds although it is possible that someone from a strong debating nation, and who has good credentials and references as well as prior adjudication or debating experience in other competitions, might possibly qualify even if they have not attended WSDC before. That person is more likely to qualify as a "B" not an "A" initially because they are relatively unknown. You'd generally feel safe putting one or two of the B judges into a debate without worrying too much about a possible maverick result. Sometimes they can and will chair a panel. “ C judges” are those who have attended one or more previous WSDCs but have not "stood out" as a competent or solid judge; or else they meet the eligibility criteria but you might have a few concerns, not sufficient to reject the application. Ideally, you want no more than one "C" per debate.

“D judges” are usually first-timers about whom you know nothing or people who, as the tournament progresses, give rise to some concerns sufficient for you to consider that they would benefit from more training or should be sparingly used. Sometimes, depending on the number of judges, "C”s and “D"s might be combined so that there are A, B and C categories only.

You may not know every judge at the tournament, even if they have been to multiple World Championships – we’re a large community. It would be unfair, however, to penalise judges because you happen not have formed an opinion of them in previous Worlds. Between the members of the CAP, however, you should have a good idea of all of the returnees. So make sure you consult your panel members before grading judges.

Also, the grades (if used) are purely for your purposes for judge allocation, and not for public display. An error in 1999 saw judges’ grades pinned to the hotel noticeboard one morning – not to be repeated!

These grades are constantly re-evaluated and changed where appropriate, after the adjudicator training day and after every round. So a judge might well be promoted from D to A in the course of the tournament, if it becomes clear that they know what they’re doing.

Hopefully you will have more than one A-grade judge for every room. So the first task is to allocate them to panels, then allocate the Bs, then the Cs and so on.

Given that you would expect A and B judges to get the ‘right’ result, or if they disagree, to do so with good reasons, you would hope to have at least two A or B judges on every panel of three. So common panels might be: ABC, ABB, ABD, or perhaps AAA or AAB for a “close debate”. Panels with grades of ACD or ACC would give you cause for concern – although of course you are acting on the information available to you at the moment, and those C and D judges may be re-graded upwards later in the tournament.

Geography
Ideally you don’t want two judges from any one country on the same panel, though this is occasionally necessary in emergencies. Likewise, three judges from the same region or debating perspective might be best avoided – if the USA was matched against Indonesia, then having judges from Singapore, Philippines and Malaysia might cause a perception problem (even if the judges were entirely free from bias).

In addition, audiences are more excited about an international tournament, and less so when faced with panels of judges from their home country. You should restrict the use of local judges in general, and particularly try and avoid using more than one on a panel.

If you are spoilt for choice, then check that the overall pool of judges is balanced geographically – don’t put an English judge in every room.

Age
This can be a factor, although age is more about perception than actual quality of judges. You may have a wealth of very good university student judges, but may want to mix them with older judges rather than select entire panels of 20 year-olds.

Other factors
Finally, you may use any other criteria you feel to be important. We don’t compile and publish American-style “judging philosophies” – where judges give written answers to a series of questions about what factors in debate speeches they prioritise over others – but we do have a sense that some judges are more concerned about examples and statistics than others, some enjoy humour more than others, and so on. In order to give all types of debater a fair chance, you may try to mix these judges up.

Clashes between judges and teams
There are two types of clashes to consider. First of all, it is obvious that a judge should not adjudicate a team from their own country – even if they have no personal experience of that team prior to the tournament, they are likely to share a similar approach to debating, and an audience will perceive them as biased even if they are not. Some judges may also have ‘loyalties’ to more than one country – for example, Sacha Judd has attended WSDC as a judge while living in New Zealand, England, Hong Kong and Singapore at different points. Again, perception is the issue rather than bias, but a perfect judge allocation system should allow for judges to be ‘blocked’ against multiple teams if appropriate.

The second type of clash is where a judge adjudicates the same team more than once. Rule 10 (e) states that “a judge may judge the same team more than once, provided that the judge does not judge that team a disproportionate number of times.” In practice, teams and judges will often meet 2 or 3 times in preliminary rounds, especially where four teams are sent to a school for the day with six judges and some of the judges must inevitably judge one of their morning teams in the afternoon. Indeed, with the advent of randomnized assignments, the constraint on judges meeting the same team is often lifted on the last day of the tournament.

As a general rule, 3 should be the maximum number of encounters in the preliminaries, and fewer if possible. Once the break rounds are reached, the tally can be considered to be wiped clean, but by that stage you as CA will have a sense of which judges have judged certain teams on multiple occasions and should be able to avoid further repetition – especially in consecutive break rounds.

Chief judges
On every panel, one judge is usually assigned to be the ‘chief judge’ or chairperson of the panel for that room. (This is not necessary, and a panel may be left to determine their own chief, but it saves arguments and awkwardness if you make the decision for them - you can indicate this with an asterisk on the judging allocation sheet for the day.)

A chief judge has no extra voting power, but they are usually the senior member of the tournament personnel in that room and thus an “A” or “B” judge. As such they should introduce themselves to the chairperson and timekeeper, answer any questions they might have; intervene in the debate if necessary if the Rules need to be enforced; cope with crises; and generally reassure the chairperson that someone in the know is there to help.

The chief judge will often deliver the adjudication speech. If they are in the minority of the panel, and one of the other judges is happy to give it, then it is common for this role to be handed on. If, however, none of the judges in the majority feels comfortable delivering the adjudication – hopefully not because they are regretting their vote, but perhaps because they are newer to WSDC – then a chief judge would be expected to announce the verdict anyway – and without lots of bitter asides!

A chief judge should also collect the ballots from the chairperson at the end of the debate and return them to a member of the CAP. They should collect ballots from any shadow judges (q.v.) and follow any evaluation procedure of their fellow judges and/or of the shadow judges that you have asked of them.

When selecting chief judges, you would ordinarily pick your most experienced judges for the first few rounds. As the tournament goes on, however, it’s nice to involve other people and allow them the chance to give adjudications. ‘A’-grade judges are usually more than happy to take a back seat, until the break rounds at least.

Shadow judges
In 2001 and subsequent years, first-time adjudicators – who may have extensive experience of judging debates, but none at WSDC – have been asked to ‘shadow judge’ a certain number of rounds at the beginning of the tournament. This allows you to evaluate them, and also enables them to get used to the niceties of WSDC judging.

Shadow judges are usually assigned singly or in pairs to rounds with very experienced chief judges. They do not affect the actual result of the debate, which is judged by a panel of ‘full judges’ as normal. Shadows are given a ballot and asked to fill it out as if they were adjudicating normally – however, their ballot is clearly marked “Shadow”.

At the end of the debate they may leave the room with the other judges and listen to their discussion. As CA you might choose whether or not to let them participate in the discussion, or you may leave it to the discretion of the chief judge in the room.

After the debate, they should find time and a quiet place to discuss the debate with the chief judge, and hand in his or her ballot. The chief judge may then add written comments to the ballot or accompanying verbal comments when handing it to the CAP. The CAP will use such feedback to evaluate shadow judges and choose whether to promote them to 'full' status.

Shadow judges are usually asked to judge a minimum of two debates, which allows the CAP to consider their performance at the end of the first day’s debating. You may feel this is adequate, or you may feel that further debates are necessary. However as judge allocation is done daily, two rounds at a time, you would need to re-evaluate them at the end of Round 2, or Round 4 etc. Whatever you decide, it should be publicised well in advance – so that first-time judges aren’t disappointed by being kept in limbo, not knowing when or if their status will be re-evaluated.

After four rounds of shadowing, you should have a good idea of their abilities with written feedback from four chief judges. Judges sitting out later rounds probably shouldn’t be expected to keep shadowing, although they are very welcome to do so if they wish.

Downgrading someone from full to shadow is, of course, a possibility. Good luck!

See the Discussion page for further comments on shadow judges.

'Shadow Shadow’ judges
In 2004 a Rule was passed stating that ex-competitors could not return and adjudicate the year after they were last competing. (See Rule 13(a) (iii) – sometimes called the “x + 1” rule.) This was an addition to the existing eligibility criteria for adjudicators which helped avoid conflicts of interest, where adjudicators might be asked to judge debaters against whom they had debated in the last Championships.

Since several alumni planned to return the following year anyway, rather than exclude them from the tournament they were invited to be ‘shadow shadow judges’ (© Asher Weill), and shadow all the way through the tournament. Several have taken up this opportunity with a view to getting very helpful experience judging at Worlds, and a number have returned as full judges after that.

Judges per round
Every round must be judged by “an odd-numbered panel of at least three judges”. Preliminary rounds are traditionally judged by panels of three. In 2004, because of the large number of judges, the CA’s team considered assigning panels of five, but decided against this due to the high proportion of inexperienced or unknown judges.

As break rounds are often close debates which result in elimination of one team from the tournament, if you have large numbers of proven judges by this stage, you should use larger panels where possible. Of course by this stage you will have many judges who haven’t demonstrated the reliability to judge in the break. Octo-finals might be judged by panels of 3, quarters and semis by 5 and the final by 7 – or perhaps 7 for the semis and 9 for the final.

Rounds per judge
Hopefully you and the Convenor will have worked together to attract a healthy (although not absurd) number of judges – perhaps 60 or 70 for a tournament that needs 48 (i.e. with 32 teams and 16 debates per round).

If you simply have too few judges to run the tournament, then emergency measures come into play. In the past, team coaches were occasionally drafted to judge debates not involving their own team, but this is now prohibited by Rule 10 (c). In the unlikely event that your overall numbers are short (against the trend of recent years with the exceptions of 2005 and 2007), and you are unable to make up numbers with qualified local adjudicators, then the Council may be asked for an extraordinary suspension of 10 (c).

Assuming you have too many, then every judge should be told to expect between 1 and 7 rounds off in the preliminary rounds. Naturally you will want to use some more than others. Your most experienced judges might be given a sole round off to avoid exhaustion (although most are happy to judge as many rounds as possible). Your least experienced judges might only judge twice, say.

Keeping them happy is dealt with below, but your main priority is to pick the best panels you can – so expect to pick some judges more than others.

When giving judges rounds off, they often ask one of two things: either to have two rounds in succession off, so that they can go sightseeing or shopping – no harm in this – or, if they are given one round off, to be assigned to the same school as their own team, so that they can judge in the morning but watch their team in the afternoon. This is usually easy to accommodate, but make it clear that it’s a luxury, not a right.

Evaluating judges’ performances
As CA, your evaluation of judges is not an exact science, but you should use every method at your disposal. These include:


 * Adjudicators’ training day. In questions and answers, who seems to be demonstrating a sharp understanding of the key issues?
 * Shadow ballots. What feedback do you get from chief judges about their shadows?
 * Speaking to chief judges. Do this after every round, where possible, and ask their impressions of their fellow panellists.
 * Close examination of ballots, particularly in split-vote debates or contentious decisions, but also supposedly clear-cut ones. Are the marks of judges generally in line with each other (with some latitude)? Are judges using the proper mark scheme for Style, Content and Strategy?
 * Conversation. When asking judges how their debates went, do they defend their decisions for strong reasons, or comment on trivial matters? Do they demonstrate a good understanding of debating in general?
 * Feedback from coaches. With respect, this is often to be taken with a pinch of salt (Trevor Sather: As a coach, willing my team to win, I probably over-estimated my team’s performance in every debate by 3-5 marks compared to the judges) – but can be a useful source of information from experienced debate analysts.
 * CVs. What is their experience of high-level adjudication in general? What is their background in debate?

Using all of these methods holistically will allow you to get a reasonable assessment of every judge’s capabilities. You can then use this – yes, subjectively – when choosing panels for later rounds.

It’s worth noting that dissenting (i.e. being in the minority in a split decision) does not make someone a bad judge. There is, and should be, a certain degree of subjectivity in debating and individual speakers may make different impressions on different judges, to the extent that good judges may vote for different teams. The question is, can they defend their decision satisfactorily? If all three judges can talk logically through the reasons for their votes, demonstrating a proper understanding of the Rules, then it cannot be a ‘bad’ decision. Obviously you might want to keep an eye on serial dissenters, especially when they are less experienced and dissenting against more experienced judges. But there is room for interpretation.

In 2007, a feedback form was used for the first time, approved by the Adjudication Working Group and based on feedback forms used in New Zealand competitions. Each coach was asked to complete and return a feedback form after each debate. The chief judge in each debating room brought the 2 feedback forms to the room and gave them to each coach prior to the commencement of the debate. Overall, the feedback forms were used responsibly, and positive comments and constructive criticisms with respect to judges and resolutions proved very helpful to the CAP. Copy of the form to be added here.”

Break round panels
As always, your priority is to choose the best panels you can. However geography is still important – it may be, for example, that Scotland, England and Singapore provide a disproportionate number of experienced World Schools judges – but you don’t want every break round panel to be swamped with those judges.

Octo-finals
First, you will make the ‘cut’ – not publicly but mentally – where you abandon all those judges who haven’t proved themselves to be a safe pair of ears and eyes.

Next, which judges might have earned themselves an octo-final, but are unlikely to be required subsequently? They should be the first names on the list for your pool of octo-final judges.

Then, looking at the debate pairings on paper, are there any that look ‘easier’ than others? No debates are foregone conclusions, of course, but a match between the top-seeded team and the 16th might be much more likely to be easy to adjudicate than one between 7th and 8th, or indeed 4th and 13th, etc. (However, in 2002, 2006 and 2007, we’ve seen some significant upsets in the break round results, and as a result the performance of lesser ranked teams should never be underestimated.)

Allocate your other judges (starting with the best first), making sure that they are evenly spread where possible but also that the toughest debates have three reliable judges. Then adjust the panels to avoid country clashes and the other criteria.

Quarters and semis
After the octos, as some teams are eliminated you can begin to get a sense of the judges you will use for the rest of the tournament. Avoiding swamping with judges from the same countries becomes even more important – for example, you might have an Australian on every panel, but you wouldn’t want two.

You might consider resting judges, where a country has two or more senior judges. For example, you may decide to allot one to the semi-final and the other to the final, having rested him or her in the previous round. (Of course there is the risk that their country will be in the final, and they won’t judge at all!)

Other countries may have a pecking order of judges in your mind, and you may whittle it down until the most senior judge is left.

If a country has a handful of equally experienced judges, and you have no particular preference between them, you may want to have a quiet word with the delegation leader and ask for their thoughts. This is only as a tie-breaker, but for example England has sent the same certain judges for several years, and is happy for them to judge break rounds in strict rotation – if judge 1 gets a semi one year, then it is judge 2’s turn the next year and so on.

As with the preliminary rounds, you may want to rotate the choice of chief judge around so that it’s not always the same voices announcing the verdicts. By this stage you will have a large number of judges who have proved themselves adept at adjudications.

Grand Final
As CA it’s certainly your right to sit on the Grand Final panel if you wish, provided your country isn’t in it. This has been common practice in recent years. You may also decide to chair it and announce the verdict. (Equally you may not, for example if you are from the host country – it really shouldn’t be a local accent announcing the winner of the competition to the Grand Final audience.)

You may also consider that, for a public event, the adjudication should be delivered by the person best equipped to communicate to an audience primarily of schoolchildren. The audience may include many people with little experience of debate, and the ability to inspire and explain is important.

If you are not from the host country, then you may like to add a judge from the host, if they have qualified candidates. This has also been common practice in recent years.

After that, and you have eliminated the judges from the two countries in the Final, you should pick a panel of your best judges which demonstrates a wide geographical spread. The Final should not be the preserve of the founder countries; if a first-time judge from a new country has impressed, they may well have earned a place on the Final panel. Likewise you may decide that some judges have had their fill of Finals in recent years, and give new people an opportunity.

But nobody has the right to judge the Final – although there have been instances of judges specifically requesting CAs to select them, which just puts the CA in an awkward position. It’s your right to choose the panel, and you have all sorts of considerations to take into account – but a direct request from a judge isn’t one of them. Even worse, there have been instances of people asking CAs not to select certain other judges – it’s really none of their business.

Striking of judges by teams
A common practice in American debate tournaments is to allow both teams the chance to veto certain judges from a list of suggested candidates. The CA will then choose the panel from the names that remain. Although this has been tried out once at a WSDC, it didn’t find favour and the World Council agreed to ban it in 2000. (Minutes not available.)

However in 2003, 2004 and 2007, there were incidents where coaches made strong complaints about certain adjudicators after ‘contentious’ decisions. In such cases, the CAP decided that, while striking was not allowed, it was still within their remit to separate that judge and that team in the following round, to avoid upsetting anyone.

It should be made clear at the start of the tournament that, while coaches have the right to complain to a CAP about certain judges, they do not have the automatic right to strike judges. Their complaint becomes one of many things the CAP will consider in judge allocation.

It should also be made clear that the CAP also has the right to choose panels however they wish. They may take that complaint with enough seriousness that the judge is not allocated to certain teams again in that particular Championships. This is not the same as striking, however, as it is not automatic and entirely at the discretion of the CAP.

Dealing with judges’ expectations
Very occasionally a judge will come up to you and thank you for selecting her or him for a particularly high-profile debate. Usually, however, you’re more likely to receive complaints or hurt looks because you haven’t selected them for as many rounds as they’d hoped.

It’s very difficult to keep everyone happy and it’s almost unfortunate that, with all the other tasks on your agenda, a certain amount of pastoral care for judges is required. But naturally you don’t want to offend people outright, and may want to find time to explain, tactfully but honestly, why they are not being used again.

This is, however, a luxury in a busy tournament with so many judges. Hopefully three things will help:
 * The formalisation of a large CAP, with several faces on it who can shoulder the burden of talking to judges.
 * The fact that you are elected by the World Council, meaning that a mandate has been given to you to perform your job as best you can.
 * Publication of materials on the judge allocation procedure, so that judges can understand the sorts of things you take into account.

Where possible though, an increase in the amount of feedback given to judges about their performance can only be a good thing. You have many experienced adjudicators, on and off the CAP, who can assist you in this task.

Feedback to judges
Once you have evaluated judges’ performances over a few rounds, you may be able to give assessment and instruction to them that will keep them informed and improve their subsequent performance.

In the case of judges casting rogue votes for strange decisions, then the chief judge will already have spoken to them to ascertain their reasons – but it may be something the chief judge recommends you take up further with them.

You may also notice what chief judges don’t – a particular disparity between the marking patterns of a judge and their fellow panellists. In this case, it’s worth speaking to the chief judge about the relative performances of the speakers before talking to the other judge.

A judge may be marking generally in line with their fellow panellists, but consistently higher or lower. In this case, you should have a word with them as soon as possible and suggest that they are at variance with the norm – especially if it is a repeated offence. Likewise, if judges do not add up their scores or fill out their ballots correctly, talk to them sooner rather than later.

The constraints of time mean that feedback has to be prioritised, and the most important feedback is those judges who “could be good but just need to work on a couple of things”. You may decide after 5 or 6 rounds that certain judges shouldn’t be used any more. You could have a quiet word with them and explain this, but if they have already had several chances to adjudicate you may decide that this is over-dramatisation.

Another form is group feedback, and the possibility of a mid-tournament adjudicators' briefing has been suggested. This would be an opportunity to discuss issues that had relevance to everyone, or issue warnings against common mistakes – e.g. not filling out the winning team on the ballot. In 1999 this function was fulfilled by a daily newsletter, but a mid-tournament plenary would allow judges to ask questions. In 2004, however, the idea was mooted but not put into action as it was felt there were no issues serious enough to justify it.

Finally, as noted above, there are feedback forms completed by coaches at the conclusion of each debate and returned to the chief judge of that debate or to a member of the CAP. Although of course there will be some coaches who feel their team should have won regardless, or feel it was an unfair topic, or they simply didn’t like any of the judges, the experience of 2007 shows that most coaches acted responsibly when completing the forms and the CAP received very valuable feedback as a result. Make a further link to the feedback form here.