A shared language of assessment principles


Validity inquiry 2: Do you and your colleagues have a shared understanding of assessment language?

To be accountable to one another we need a metalanguage for our practice. The following assessment principles inform the day to day work of teachers.

Validity, reliability and equity are words that bring a precision to the work we do as assessors. Each concept highlights a different facet of assessment quality. They are part of our toolkit as informed professionals. QCAA has provided a helpful glossary of assessment terms, that we have expanded a little further. 


The closest thing that I can think of is authenticity…[the task] has integrity, that it is true, that it’s meaningful…It has to have relevance to the students’ lives and to our world, which is really important for engagement…It has to be useful assessment that allows them to feel connected to their learning (teacher interviews).

In his review on the changing understanding of validity in the assessment literature, Stobart (2009) declares that validity has become the overarching principle for assessment.  Three aspects of validity ALL need to be evident in classroom assessment to be considered high quality assessment. These are content validity, construct validity and consequential validity.

Content validity – Is the focus of the assessment task a good representation of the required skills and knowledge? This can include representing the big ideas of the discipline, and also the alignment with what has been taught to students.

Construct validity – Is the design of the task assessing what it was intended to assess? This can include evidence of accessibility in the design, alignment of all of the elements, achievability of the scope of the task, and the mode being fit for purpose. It is closely linked to the principles of reliability and equity.

Consequential validity – Does the assessment achieve the purposes for which it was intended? Who has succeeded and how can assessment be improved to enable greater success for students who have not succeeded? Are the values and ideologies of the task, and the potential and social consequences equitable? (Messick, 1989). Instead of being seen as a static property of an assessment task, that is something that a task has or does not have, validity is “contingent on what a test [or task] is for, how it is used, and how the results are interpreted” (Stobart, 2009. p. 162). It is closely linked to principles of equity.

Creating an argument for validity  Validity is a type of practical inquiry (Kane, 2016).  The connections between each design decision need to be logical and strong. In asking one another these questions and in sharing the intention behind each design decision teachers create an argument for the validity of their classroom assessment designs. This type of ongoing commitment to quality that requires us to be open to critique and peer review is intelligent accountability at a local level. You can see ways that the teachers in the two schools created arguments for the validity of their work at the bottom of this page and in the case studies in the Showcase tab.


Reliability refers to “the extent to which the results can be said to be of acceptable consistency or accuracy for a particular use” (Harlen 2013, p. 9).

A key question is to ask “Would two or more assessors reach the same judgment based on the evidence?” Stobart explains (2009 p. 174) a highly reliable assessment task “may sample so little of the construct that it does not provide a dependable assessment of that domain or skill. Similarly, we may require authentic performance assessments, which are difficult to standardise so the results may be less reliable.” Harlen (2005) identified that the relationship between validity and reliability is more like a see-saw, where an optimal balance is achieved.  So while perfect reliability is not possible, intelligent accountability to a community of peers, the system and to students for designing reliable, quality and purposeful assessment can enable regular checks for reliability.


Valid, equitable assessment is a child’s right (Elwood & Lundy, 2010). Elwood and Lundy identify three principles for quality assessment practice:

Equity in assessment is evident when it in the best interests of the student, it is non-discriminatory and enables full participation.

Equity in assessment encompasses a range of considerations such as culture inclusive assessment that includes respect for cultural and language differences, pedagogic support for students to access the literacy demands of assessment tasks or items, and awareness of the  fairness of assessment practices (Willis, Adie & Klenowski, 2013). Equity also encompasses designing tasks that enable students with disabilities to access the essential knowledge of the task and demonstrate their understanding through multiple means of representation as well as enabling all students to extend their abilities through tasks that promote “cognitive stretch” (Wyatt-Smith & Colbert, 2014. p. 38). Consultation with parents and communities is one way to evaluate whether assessment and pedagogic practices demonstrate respect for cultural and learning needs. Authenticity and adaptation to meet local school contexts is another hallmark of an equitable assessment task.


Discuss with a colleague: 

Which of these assessment principles are prioritised in your context?

How do you currently give an account of your assessment designs to peers?

What informs your understanding of quality?

Give an example of a time when you have adapted assessment in productive and ethical ways.


Intelligent accountability

Intelligent accountability in schooling systems is a process of building trust and strengthening collective responsibility for students to achieve valued outcomes (Sahlberg, 2010). It is different to types of accountability that emphasise conformity and auditing that attempt to control the work of teachers at a distance.  Onora O’Neill, a philosopher who focuses on trust and accountability in public life explains that trust increases when we give each other adequate, useful and simple evidence that we are trustworthy, drawing from our day to day practice. This is a form of intelligent accountability. It occurs through self-governance within a framework of reporting, when professionals give an account of what they have done to those who “have sufficient time and experience to assess the evidence and report on it” (O’Neill, 2002. p. 58).

Intelligent accountability relies on informed professionalism, that is professional judgment developed through shared critical inquiries between professional peers. Qualitative judgements are informed through experience in evaluating multiple examples of assessment evidence. Determining the quality of complex works is a skill “that is not reducible to a set of measures or formal procedures that a non-expert could apply to arrive at the ‘correct’ appraisal” (Sadler, 2009, p. 160). Professionals develop complex understandings of qualities that are latent and explicit (Wyatt-Smith & Klenowski, 2013). As part of our role as  intelligent professionals, teachers also have a responsibility for continually scrutinising the processes of assessment, both academic and social, so that they can then inform and shape assessment in productive and ethical ways (Torrance, 2017).

What does it look like in practice?

If this is the purpose…then these are the opportunities students need…so these types of outcomes are evident for students.

School A -Made an argument for the validity of their work by creating an overview of their design School A Argument for validity alongside their teaching plan School A unit overview 11 Crucible unit overview to get feedback from reviewers before they started teaching.

School B used track change comments to make an argument for the validity of their design School B The Crucible Unit plan with teacher comments.  Reviewers reported back to the teachers on the evidence School B Peer review from the university team.


Elwood, J., & Lundy, L. (2010). Revisioning assessment through a children’s rights approach: implications for policy, process and practice. Research Papers in Education25(3), 335-353.

Kane, M. T. (2016). Explicating validity. Assessment in Education: Principles, Policy & Practice23(2), 198-211.

Harlen, W. (2005). Trusting teachers’ judgement: Research evidence of the reliability and validity of teachers’ assessment used for summative purposes. Research papers in education20(3), 245-270.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), The American Council on Education/Macmillan series on higher education. Educational measurement (pp. 13-103). New York: Macmillan

Moss, P. (2016) Shifting the focus of validity for test use, Assessment in Education: Principles, Policy & Practice, 23:2, 236-251, DOI: 10.1080/0969594X.2015.1072085

O’Neill, O. (2002). A question of trust: The BBC Reith Lectures 2002. Cambridge University Press.

O’Neill, O. (2013). Intelligent accountability in education. Oxford Review of Education39(1), 4-16.

Sadler, D. R. (2009). Indeterminacy in the use of preset criteria for assessment and grading. Assessment & Evaluation in Higher Education34(2), 159-179.

Sahlberg, P. (2010). Rethinking accountability in a knowledge society. Journal of Educational Change, 11(1), 45-61.

Stobart, G. (2009). Determining validity in national curriculum assessments. Educational Research, 51(2), 161-179. doi:10.1080/00131880902891305

Torrance, H. (2017). Blaming the victim: assessment, examinations, and the responsibilisation of students and teachers in neo-liberal governance. Discourse: studies in the Cultural Politics of Education38(1), 83-96.

Willis, J., Adie, L., & Klenowski, V. (2013). Conceptualising teachers’ assessment literacies in an era of curriculum and assessment reform. The Australian Educational Researcher40(2), 241-256.

Wyatt-Smith, C., & Colbert, P. (2014). An account of the inner workings of standards, judgement and moderation: A previously untold evidence-based narrative. ACER report accessed from http://research.acer.edu.au/cgi/viewcontent.cgi?article=1006&context=qld_review

Wyatt-Smith, C., & Klenowski, V. (2013). Explicit, latent and meta-criteria: Types of criteria at play in professional judgement practice. Assessment in Education: Principles, Policy & Practice20(1), 35-52.