Exam Questions: Types, Characteristics, and Suggestions

Examinations are a very common assessment and evaluation tool in universities and there are many types of examination questions. This tips sheet contains a brief description of seven types of examination questions, as well as tips for using each of them: 1) multiple choice, 2) true/false, 3) matching, 4) short answer, 5) essay, 6) oral, and 7) computational. Remember that some exams can be conducted effectively in a secure online environment in a proctored computer lab or assigned as paper based or online “take home” exams.

Multiple choice

Multiple choice questions are composed of one question (stem) with multiple possible answers (choices), including the correct answer and several incorrect answers (distractors). Typically, students select the correct answer by circling the associated number or letter, or filling in the associated circle on the machine-readable response sheet.

Example : Distractors are:

A) Elements of the exam layout that distract attention from the questions B) Incorrect but plausible choices used in multiple choice questions C) Unnecessary clauses included in the stem of multiple choice questions Answer: B

Students can generally respond to these type of questions quite quickly. As a result, they are often used to test student’s knowledge of a broad range of content. Creating these questions can be time consuming because it is often difficult to generate several plausible distractors. However, they can be marked very quickly.

Tips for writing good multiple choice items:

Suggestion : After each lecture during the term, jot down two or three multiple choice questions based on the material for that lecture. Regularly taking a few minutes to compose questions, while the material is fresh in your mind, will allow you to develop a question bank that you can use to construct tests and exams quickly and easily.

True/false questions are only composed of a statement. Students respond to the questions by indicating whether the statement is true or false. For example: True/false questions have only two possible answers (Answer: True).

Like multiple choice questions, true/false questions:

  • Are most often used to assess familiarity with course content and to check for popular misconceptions
  • Allow students to respond quickly so exams can use a large number of them to test knowledge of a broad range of content
  • Are easy and quick to grade but time consuming to create

True/false questions provide students with a 50% chance of guessing the right answer. For this reason, multiple choice questions are often used instead of true/false questions.

Tips for writing good true/false items:

Suggestion : You can increase the usefulness of true/false questions by asking students to correct false statements.

Students respond to matching questions by pairing each of a set of stems (e.g., definitions) with one of the choices provided on the exam. These questions are often used to assess recognition and recall and so are most often used in courses where acquisition of detailed knowledge is an important goal. They are generally quick and easy to create and mark, but students require more time to respond to these questions than a similar number of multiple choice or true/false items.

Example: Match each question type with one attribute:

  • Multiple Choice a) Only two possible answers
  • True/False b) Equal number of stems and choices
  • Matching c) Only one correct answer but at least three choices

Tips for writing good matching items:

Suggestion:  You can use some choices more than once in the same matching exercise. It reduces the effects of guessing.

Short answer

Short answer questions are typically composed of a brief prompt that demands a written answer that varies in length from one or two words to a few sentences. They are most often used to test basic knowledge of key facts and terms. An example this kind of short answer question follows:

“What do you call an exam format in which students must uniquely associate a set of prompts with a set of options?” Answer: Matching questions

Alternatively, this could be written as a fill-in-the-blank short answer question:

“An exam question in which students must uniquely associate prompts and options is called a ___________ question.” Answer: Matching.

Short answer questions can also be used to test higher thinking skills, including analysis or evaluation. For example:

“Will you include short answer questions on your next exam? Please justify your decision with two to three sentences explaining the factors that have influenced your decision.”

Short answer questions have many advantages. Many instructors report that they are relatively easy to construct and can be constructed faster than multiple choice questions. Unlike matching, true/false, and multiple choice questions, short answer questions make it difficult for students to guess the answer. Short answer questions provide students with more flexibility to explain their understanding and demonstrate creativity than they would have with multiple choice questions; this also means that scoring is relatively laborious and can be quite subjective. Short answer questions provide more structure than essay questions and thus are often easy and faster to mark and often test a broader range of the course content than full essay questions.

Tips for writing good short answer items:

Suggestion : When using short answer questions to test student knowledge of definitions consider having a mix of questions, some that supply the term and require the students to provide the definition, and other questions that supply the definition and require that students provide the term. The latter sort of questions can be structured as fill-in-the-blank questions. This mix of formats will better test student knowledge because it doesn’t rely solely on recognition or recall of the term.

Essay questions provide a complex prompt that requires written responses, which can vary in length from a couple of paragraphs to many pages. Like short answer questions, they provide students with an opportunity to explain their understanding and demonstrate creativity, but make it hard for students to arrive at an acceptable answer by bluffing. They can be constructed reasonably quickly and easily but marking these questions can be time-consuming and grader agreement can be difficult.

Essay questions differ from short answer questions in that the essay questions are less structured. This openness allows students to demonstrate that they can integrate the course material in creative ways. As a result, essays are a favoured approach to test higher levels of cognition including analysis, synthesis and evaluation. However, the requirement that the students provide most of the structure increases the amount of work required to respond effectively. Students often take longer to compose a five paragraph essay than they would take to compose five one paragraph answers to short answer questions. This increased workload limits the number of essay questions that can be posed on a single exam and thus can restrict the overall scope of an exam to a few topics or areas. To ensure that this doesn’t cause students to panic or blank out, consider giving the option of answering one of two or more questions.

Tips for writing good essay items:

Suggestions : Distribute possible essay questions before the exam and make your marking criteria slightly stricter. This gives all students an equal chance to prepare and should improve the quality of the answers – and the quality of learning – without making the exam any easier.

Oral examinations allow students to respond directly to the instructor’s questions and/or to present prepared statements. These exams are especially popular in language courses that demand ‘speaking’ but they can be used to assess understanding in almost any course by following the guidelines for the composition of short answer questions. Some of the principle advantages to oral exams are that they provide nearly immediate feedback and so allow the student to learn as they are tested. There are two main drawbacks to oral exams: the amount of time required and the problem of record-keeping. Oral exams typically take at least ten to fifteen minutes per student, even for a midterm exam. As a result, they are rarely used for large classes. Furthermore, unlike written exams, oral exams don’t automatically generate a written record. To ensure that students have access to written feedback, it is recommended that instructors take notes during oral exams using a rubric and/or checklist and provide a photocopy of the notes to the students.

In many departments, oral exams are rare. Students may have difficulty adapting to this new style of assessment. In this situation, consider making the oral exam optional. While it can take more time to prepare two tests, having both options allows students to choose the one which suits them and their learning style best.

Computational

Computational questions require that students perform calculations in order to solve for an answer. Computational questions can be used to assess student’s memory of solution techniques and their ability to apply those techniques to solve both questions they have attempted before and questions that stretch their abilities by requiring that they combine and use solution techniques in novel ways.

Effective computational questions should:

  • Be solvable using knowledge of the key concepts and techniques from the course. Before the exam solve them yourself or get a teaching assistant to attempt the questions.
  • Indicate the mark breakdown to reinforce the expectations developed in in-class examples for the amount of detail, etc. required for the solution.

To prepare students to do computational questions on exams, make sure to describe and model in class the correct format for the calculations and answer including:

  • How students should report their assumptions and justify their choices
  • The units and degree of precision expected in the answer

Suggestion : Have students divide their answer sheets into two columns: calculations in one, and a list of assumptions, description of process and justification of choices in the other. This ensures that the marker can distinguish between a simple mathematical mistake and a profound conceptual error and give feedback accordingly.

If you would like support applying these tips to your own teaching, CTE staff members are here to help.  View the  CTE Support  page to find the most relevant staff member to contact.

  • Cunningham, G.K. (1998). Assessment in the Classroom. Bristol, PA: Falmer Press.
  • Ward, A.W., & Murray-Ward, M. (1999). Assessment in the Classroom. Belmont, CA: Wadsworth Publishing Co.

teaching tips

This Creative Commons license  lets others remix, tweak, and build upon our work non-commercially, as long as they credit us and indicate if changes were made. Use this citation format:  Exam questions: types, characteristics and suggestions . Centre for Teaching Excellence, University of Waterloo .

Catalog search

Teaching tip categories.

  • Assessment and feedback
  • Blended Learning and Educational Technologies
  • Career Development
  • Course Design
  • Course Implementation
  • Inclusive Teaching and Learning
  • Learning activities
  • Support for Student Learning
  • Support for TAs
  • Reference Manager
  • Simple TEXT file

People also looked at

Conceptual analysis article, the past, present and future of educational assessment: a transdisciplinary perspective.

definition of written test in education

  • 1 Department of Applied Educational Sciences, Umeå Universitet, Umeå, Sweden
  • 2 Faculty of Education and Social Work, The University of Auckland, Auckland, New Zealand

To see the horizon of educational assessment, a history of how assessment has been used and analysed from the earliest records, through the 20th century, and into contemporary times is deployed. Since paper-and-pencil assessments validity and integrity of candidate achievement has mattered. Assessments have relied on expert judgment. With the massification of education, formal group-administered testing was implemented for qualifications and selection. Statistical methods for scoring tests (classical test theory and item response theory) were developed. With personal computing, tests are delivered on-screen and through the web with adaptive scoring based on student performance. Tests give an ever-increasing verisimilitude of real-world processes, and analysts are creating understanding of the processes test-takers use. Unfortunately testing has neglected the complicating psychological, cultural, and contextual factors related to test-taker psychology. Computer testing neglects school curriculum and classroom contexts, where most education takes place and where insights are needed by both teachers and learners. Unfortunately, the complex and dynamic processes of classrooms are extremely difficult to model mathematically and so remain largely outside the algorithms of psychometrics. This means that technology, data, and psychometrics have become increasingly isolated from curriculum, classrooms, teaching, and the psychology of instruction and learning. While there may be some integration of these disciplines within computer-based testing, this is still a long step from where classroom assessment happens. For a long time, educational, social, and cultural psychology related to learning and instruction have been neglected in testing. We are now on the cusp of significant and substantial development in educational assessment as greater emphasis on the psychology of assessment is brought into the world of testing. Herein lies the future for our field: integration of psychological theory and research with statistics and technology to understand processes that work for learning, identify how well students have learned, and what further teaching and learning is needed. The future requires greater efforts by psychometricians, testers, data analysts, and technologists to develop solutions that work in the pressure of living classrooms and that support valid and reliable assessment.

Introduction

In looking to the horizon of educational assessment, I would like to take a broad chronological view of where we have come from, where we are now, and what the horizons are. Educational assessment plays a vital role in the quality of student learning experiences, teacher instructional activities, and evaluation of curriculum, school quality, and system performance. Assessments act as a lever for both formative improvement of teaching and learning and summative accountability evaluation of teachers, schools, and administration. Because it is so powerful, a nuanced understanding of its history, current status, and future possibilities seems a useful exercise. In this overview I begin with a brief historical journey from assessments past through the last 3000 years and into the future that is already taking place in various locations and contexts.

Early records of the Chinese Imperial examination system can be found dating some 2,500 to 3,000 years ago ( China Civilisation Centre, 2007 ). That system was used to identify and reward talent wherever it could be found in the sprawling empire of China. Rather than rely solely on recommendations, bribery, or nepotism, it was designed to meritocratically locate students with high levels of literacy and memory competencies to operate the Emperor’s bureaucracy of command and control of a massive population. To achieve those goals, the system implemented standardised tasks (e.g., completing an essay according to Confucian principles) under invigilated circumstances to ensure integrity and comparability of performances ( Feng, 1995 ). The system had a graduated series of increasingly more complex and demanding tests until at the final examination no one could be awarded the highest grade because it was reserved for the Emperor alone. Part of the rationale for this extensive technology related to the consequences attached to selection; not only did successful candidates receive jobs with substantial economic benefits, but they were also recognised publicly on examination lists and by the right to wear specific colours or badges that signified the level of examination the candidate had passed. Unsurprisingly, given the immense prestige and possibility of social advancement through scholarship, there was an industry of preparing cheat materials (e.g., miniature books that replicated Confucian classics) and catching cheats (e.g., ranks of invigilators in high chairs overlooking desks at which candidates worked; Elman, 2013 ).

In contrast, as described by Encyclopedia Brittanica (2010a) , European educational assessment grew out of the literary and oratorical remains of the Roman empire such as schools of grammarians and rhetoricians. At the same time, schools were formed in the various cathedrals, monasteries (especially, the Benedictine monasteries), and episcopal schools throughout Europe. Under Charlemagne, church priests were required to master Latin so that they could understand scripture correctly, leading to more advanced religious and academic training. As European society developed in the early Renaissance, schools were opened under the authority of a bishop or cathedral officer or even from secular guilds to those deemed sufficiently competent to teach. Students and teachers at these schools were given certain protection and rights to ensure safe travel and free thinking. European universities from the 1100s adopted many of the clerical practices of reading important texts and scholars evaluating the quality of learning by student performance in oral disputes, debates, and arguments relative to the judgement of higher ranked experts. The subsequent centuries added written tasks and performances to the oral disputes as a way of judging the quality of learning outcomes. Nonetheless, assessment was based, as the Chinese Imperial system, on the expertise and judgment of more senior scholars or bureaucrats.

These mechanisms were put in place to meet the needs of society or religion for literate and numerate bureaucrats, thinkers, and scholars. The resource of further education, or even basic education, was generally rationed and limited. Standardised assessments, even if that were only the protocol rather than the task or the scoring, were carried out to select candidates on a relatively meritocratic basis. Families and students engaged in these processes because educational success gave hope of escape from lives of poverty and hard labour. Consequently, assessment was fundamentally a summative judgement of the student’s abilities, schooling was preparation for the final examination, and assessments during the schooling process were but mimicry of a final assessment.

With the expansion of schooling and higher education through the 1800s, more efficient methods were sought to the workload surrounding hearing memorized recitations ( Encyclopedia Brittanica, 2010b ). This led to the imposition of leaving examinations as an entry requirement to learned professions (e.g., being a teacher), the civil service, and university studies. As more and more students attended universities in the 1800s, more efficient ways collecting information were established, most especially the essay examination and the practice of answering in writing by oneself without aids. This tradition can still be seen in ordered rows of desks in examination halls as students complete written exam papers under scrutiny and time pressure.

The 20th century

By the early 1900s, however, it became apparent that the scoring of these important intellectual exercises was highly problematic. Markers did not agree with each other nor were they consistent within themselves across items or tasks and over time so that their scores varied for the same work. Consequently, early in the 20th century, multiple-choice question tests were developed so that there would be consistency in scoring and efficiency in administration ( Croft and Beard, 2022 ). It is also worth noting that considerable cost and time efficiencies were obtained through using multiple-choice test methods. This aspect led, throughout the century, to increasingly massive use of standardised machine scoreable tests for university entrance, graduate school selection, and even school evaluation. The mechanism of scoring items dichotomously (i.e., right or wrong), within classical test theory statistical modelling, resulted in easy and familiar numbers (e.g., mean, standard deviation, reliability, and standard error of measurement; Clauser, 2022 ).

As the 20th century progressed, the concepts of validity have grown increasingly expansive, and the methods of validation have become increasingly complex and multi-faceted to ensure validity of scores and their interpretation ( Zumbo and Chan, 2014 ). These included scale reliability, factor analysis, item response theory, equating, norming, and standard setting, among others ( Kline, 2020 ). It is worth noting here that statistical methods for test score analysis grew out of the early stages of the discipline of psychology. As psychometric methods became increasingly complex, the world of educational testing began to look much more like the world of statistics. Indeed, Cronbach (1954) noted that the world of psychometrics (i.e., statistical measurement of psychological phenomena) was losing contact with the world of psychology which was the most likely user of psychometric method and research. Interestingly, the world of education makes extensive use of assessment, but few educators are adept at the statistical methods necessary to evaluate their own tests, let alone those from central authorities. Indeed, few teachers are taught statistical test analysis techniques, even fewer understand them, and almost none make use of them.

Of course, assessment is not just a scored task or set of questions. It is legitimately an attempt to operationalize a sample of a construct or content or curriculum domain. The challenge for assessment lies in the conundrum that the material that is easy to test and score tends to be the material that is the least demanding or valuable in any domain. Learning objectives for K-12 schooling, let alone higher education, expect students to go beyond remembering, recalling, regurgitating lists of terminology, facts, or pieces of data. While recall of data pieces is necessary for deep processing, recall of those details is not sufficient. Students need to exhibit complex thinking, problem-solving, creativity, and analysis and synthesis. Assessment of such skills is extremely complex and difficult to achieve.

However, with the need to demonstrate that teachers are effective and that schools are achieving society’s goals and purposes it becomes easy to reduce the valued objectives of society to that which can be incorporated efficiently into a standardised test. Hence, in many societies the high-stakes test becomes the curriculum. If we could be sure that what was on the test is what society really wanted, this would not be such a bad thing; what Resnick and Resnick (1989) called measurement driven reform. However, research over extensive periods since the middle of the 20 th century has shown that much of what we test does not add value to the learning of students ( Nichols and Harris, 2016 ).

An important development in the middle of the 20th century was Scriven’s (1967) work on developing the principles and philosophy of evaluation. A powerful aspect to evaluation that he identified was the distinction between formative evaluation taking place early enough in a process to make differences to the end points of the process and summative evaluation which determined the amount and quality or merit of what the process produced. The idea of formative evaluation was quickly adapted into education as a way of describing assessments that teachers used within classrooms to identify which children needed to be taught what material next ( Bloom et al., 1971 ). This contrasted nicely with high-stakes end-of-unit, end-of-course, or end-of-year formal examinations that summatively judged the quality of student achievement and learning. While assessment as psychometrically validated tests and examinations historically focused on the summative experience, Scriven’s formative evaluation led to using assessment processes early in the educational course of events to inform learners as to what they needed to learn and instructors as to what they needed to teach.

Nonetheless, since the late 1980s (largely thanks to Sadler, 1989 ) the distinction between summative and formative transmogrified from timing to one of type. Formative assessments began to be only those which were not formal tests but were rather informal interactions in classrooms. This perspective was extended by the UK Assessment Reform Group (2002) which promulgated basic principles of formative assessment around the world. Those classroom assessment practices focused much more on what could be seen as classroom teaching practices ( Brown, 2013 , 2019 , 2020a ). Instead of testing, teachers interacted with students on-the-fly, in-the-moment of the classroom through questions and feedback that aimed to help students move towards the intended learning outcomes established at the beginning of lessons or courses. Thus, assessment for learning has become a child-friendly approach ( Stobart, 2006 ) to involving learners in their learning and developing rich meaningful outcomes without the onerous pressure of testing. Much of the power of this approach was that it came as an alternative to the national curriculum of England and Wales that incorporated high-stakes standardised assessment tasks of children at ages 7, 9, 11, and 14 (i.e., Key Stages 1 to 4; Wherrett, 2004 ).

In line with increasing access to schooling worldwide throughout the 20 th century, there is concern that success on high-consequence, summative tests simply reinforced pre-existing social status and hierarchy ( Bourdieu, 1974 ). This position argues tests are not neutral but rather tools of elitism ( Gipps, 1994 ). Unfortunately, when assessments have significant consequences, much higher proportions of disadvantaged students (e.g., minority students, new speakers of the language-medium of assessment, special needs students, those with reading difficulties, etc.) do not experience such benefits ( Brown, 2008 ). This was a factor in the development of using assessment high-quality formative assessment to accelerate the learning progression of disadvantaged students. Nonetheless, differences in group outcomes do not always mean tests are the problem; group score differences can point out that there is sociocultural bias in the provision of educational resources in the school system ( Stobart, 2005 ). This would be rationale for system monitoring assessments, such as Hong Kong’s Territory Wide System Assessment, 1 the United States’ National Assessment of Educational Progress, 2 or Australia’s National Assessment Program Literacy and Numeracy. 3 The challenge is how to monitor a system without blaming those who have been let down by it.

Key Stage tests were put in place, not only to evaluate student learning, but also to assure the public that teachers and schools were achieving important goals of education. This use of assessment put focus on accountability, not for the student, but for the school and teacher ( Nichols and Harris, 2016 ). The decision to use tests of student learning to evaluate schools and teachers was mimicked, especially in the United States, in various state accountability tests, the No Child Left Behind legislation, and even such innovative programs of assessment as Race to the Top and PARCC. It should be noted that the use of standardised tests to evaluate teachers and schools is truly a global phenomenon, not restricted to the UK and the USA ( Lingard and Lewis, 2016 ). In this context, testing became a summative evaluation of teachers and school leaders to demonstrate school effectiveness and meet accountability requirements.

The current situation is that assessment is perceived quite differently by experts in different disciplines. Psychometricians tend to define assessment in terms of statistical modelling of test scores. Psychologists use assessments for diagnostic description of client strengths or needs. Within schooling, leaders tend to perceive assessment as jurisdiction or state-mandated school accountability testing, while teachers focus on assessment as interactive, on-the-fly experiences with their students, and parents ( Buckendahl, 2016 ; Harris and Brown, 2016 ) understand assessment as test scores and grades. The world of psychology has become separated from the worlds of classroom teaching, curriculum, psychometrics and statistics, and assessment technologies.

This brief history bringing us into early 21 st century shows that educational assessment is informed by multiple disciplines which often fail to talk with or even to each other. Statistical analysis of testing has become separated from psychology and education, psychology is separated from curriculum, teaching is separated from testing, and testing is separated from learning. Hence, we enter the present with many important facets that inform effective use of educational assessment siloed from one another.

Now and next

Currently the world of educational statistics has become engrossed in the large-scale data available through online testing and online learning behaviours. The world of computational psychometrics seeks to move educational testing statistics into the dynamic analysis of big data with machine learning and artificial intelligence algorithms potentially creating a black box of sophisticated statistical models (e.g., neural networks) which learners, teachers, administrators, and citizens cannot understand ( von Davier et al., 2019 ). The introduction of computing technologies means that automation of item generation ( Gierl and Lai, 2016 ) and scoring of performances ( Shin et al., 2021 ) is possible, along with customisation of test content according to test-taker performance ( Linden and Glas, 2000 ). The Covid-19 pandemic has rapidly inserted online and distance testing as a commonplace practice with concerns raised about how technology is used to assure the integrity of student performance ( Dawson, 2021 ).

The ecology of the classroom is not the same as that of a computerised test. This is especially notable when the consequence of a test (regardless of medium) has little relevance to a student ( Wise and Smith, 2016 ). Performance on international large-scale assessments (e.g., PISA, TIMSS) may matter to government officials ( Teltemann and Klieme, 2016 ) but these tests have little value for individual learners. Nonetheless, governmental responses to PISA or TIMSS results may create policies and initiatives that have trickle-down effect on schools and students ( Zumbo and Forer, 2011 ). Consequently, depending on the educational and cultural environment, test-taking motivation on tests that have consequences for the state can be similar to a test with personal consequence in East Asia ( Zhao et al., 2020 ), but much lower in a western democracy ( Zhao et al., 2022 ). Hence, without surety that in any educational test learners are giving full effort ( Thorndike, 1924 ), the information generated by psychometric analysis is likely to be invalid. Fortunately, under computer testing conditions, it is now possible to monitor reduced or wavering effort during an actual test event and provide support to such a student through a supervising proctor ( Wise, 2019 ), though this feature is not widely prevalent.

Online or remote teaching, learning, and assessment have become a reality for many teachers and students, especially in light of our educational responses to the Covid-19 pandemic. Clearly, some families appreciate this because their children can progress rapidly, unencumbered by the teacher or classmates. For such families, continuing with digital schooling would be seen as a positive future. However, reliance on a computer interface as the sole means of assessment or teaching may dehumanise the very human experience of learning and teaching. As Asimov (1954) described in his short story of a future world in which children are taught individually by machines, Margie imagined what it must have been like to go to school with other children:

Margie …was thinking about the old schools they had when her grandfather's grandfather was a little boy. All the kids from the whole neighborhood came, laughing and shouting in the schoolyard, sitting together in the schoolroom, going home together at the end of the day. They learned the same things so they could help one another on the homework and talk about it.
And the teachers were people...
The mechanical teacher was flashing on the screen: "When we add the fractions ½ and ¼ -"
Margie was thinking about how the kids must have loved it in the old days. She was thinking about the fun they had.

As Brown (2020b) has argued the option of a de-schooled society through computer-based teaching, learning, and assessment is deeply unattractive on the grounds that it is likely to be socially unjust. The human experience of schooling matters to the development of humans. We learn through instruction ( Bloom, 1976 ), culturally located experiences ( Cole et al., 1971 ), inter-personal interaction with peers and adults ( Vygotsky, 1978 ; Rogoff, 1991 ), and biogenetic factors ( Inhelder and Piaget, 1958 ). Schooling gives us access to environments in which these multiple processes contribute to the kinds of citizens we want. Hence, we need confidence in the power of shared schooling to do more than increase the speed by which children acquire knowledge and learning; it helps us be more human.

This dilemma echoes the tension between in vitro and in vivo biological research. Within the controlled environment of a test tube (vitro) organisms do not necessarily behave the same way as they do when released into the complexity of human biology ( Autoimmunity Research Foundation, 2012 ). This analogy has been applied to educational assessment ( Zumbo, 2015 ) indicating that how students perform in a computer-mediated test may not have validity for how students perform in classroom interactions or in-person environments.

The complexity of human psychology is captured in Hattie’s (2004) ROPE model which posits that the various aspects of human motivation, belief, strategy, and values interact as threads spun into a rope. This means it is hard to analytically separate the various components and identify aspects that individually explain learning outcomes. Indeed, Marsh et al. (2006) showed that of the many self-concept and control beliefs used to predict performance on the PISA tests, almost all variables have relations to achievement less than r  = 0.35. Instead, interactions among motivation, beliefs about learning, intelligence, assessment, the self, and attitudes with and toward others, subjects, and behaviours all matter to performance. Aspects that create growth-oriented pathways ( Boekaerts and Niemivirta, 2000 ) and strategies include inter alia mastery goals ( Deci and Ryan, 2000 ), deep learning ( Biggs et al., 2001 ) beliefs, malleable intelligence ( Duckworth et al., 2011 ) beliefs, improvement-oriented beliefs about assessment ( Brown, 2011 ), internal, controllable attributes ( Weiner, 1985 ), effort ( Wise and DeMars, 2005 ), avoiding dishonesty ( Murdock et al., 2016 ), trusting one’s peers ( Panadero, 2016 ), and realism in evaluating one’s own work ( Brown and Harris, 2014 ). All these adaptive aspects of learning stand in contrast to deactivating and maladaptive beliefs, strategies, and attitudes that serve to protect the ego and undermine learning. What this tells us that psychological research matters to understanding the results of assessment and that no one single psychological construct is sufficient to explain very much of the variance in student achievement. However, it seems we are as yet unable to identify which specific processes matter most to better performance for all students across the ability spectrum, given that almost all the constructs that have been reported in educational psychology seem to have a positive contribution to better performance. Here is the challenge for educational psychology within an assessment setting —which constructs are most important and effectual before, during, and after any assessment process ( Mcmillan, 2016 ) and how should they be operationalised.

A current enthusiasm is to use ‘big data’ from computer-based assessments to examine in more detail how students carry out the process of responding to tasks. Many large-scale testing programs through computer testing collect, utilize, and report on test-taker engagement as part of their process data collection (e.g., the United States National Assessment of Educational Progress 4 ). These test systems provide data about what options were clicked on, in what order, what pages were viewed, and the timings of these actions. Several challenges to using big data in educational assessment exist. First, computerised assessments need to capture the processes and products we care about. That means we need a clear theoretical model of the underlying cognitive mechanisms or processes that generate the process data itself ( Zumbo et al., in press ). Second, we need to be reminded that data do not explain themselves; theory and insight about process are needed to understand data ( Pearl and Mackenzie, 2018 ). Examination of log files can give some insight into effective vs. ineffective strategies, once the data were analysed using theory to create a model of how a problem should be done ( Greiff et al., 2015 ). Access to data logs that show effort and persistence on a difficult task can reveal that, despite failure to successfully resolve a problem, such persistence is related to overall performance ( Lundgren and Eklöf, 2020 ). But data by itself will not tell us how and why students are successful and what instruction might need to do to encourage students to use the scientific method of manipulating one variable at a time or not giving up quickly.

Psychometric analyses of assessments can only statistically model item difficulty, item discrimination, and item chance parameters to estimate person ability ( Embretson and Reise, 2000 ). None of the other psychological features of how learners relate to themselves and their environment are included in score estimation. In real classroom contexts, teachers make their best efforts to account for individual motivation, affect, and cognition to provide appropriate instruction, feedback, support, and questioning. However, the nature of these factors varies across time (cohorts), locations (cultures and societies), policy priorities for schooling and assessment, and family values ( Brown and Harris, 2009 ). This means that what constitutes a useful assessment to inform instruction in a classroom context (i.e., identify to the teacher who needs to be taught what next) needs to constantly evolve and be incredibly sensitive to individual and contextual factors. This is difficult if we keep psychology, curriculum, psychometrics, and technology in separate silos. It seems highly desirable that these different disciplines interact, but it is not guaranteed that the technology for psychometric testing developments will cross-pollinate with classroom contexts where teachers have to relate to and monitor student learning across all important curricular domains.

It is common to treat what happens in the minds and emotions of students when they are assessed as a kind of ‘black box’ implying that the processes are opaque or unknowable. This is an approach I have taken previously in examining what students do when asked to self-assess ( Yan and Brown, 2017 ). However, the meaning of a black box is quite different in engineering. In aeronautics, the essential constructs related to flight (e.g., engine power, aileron settings, pitch and yaw positions, etc.) are known very deeply, otherwise flight would not happen. The black box in an airplane records the values of those important variables and the only thing unknown (i.e., black) is what the values were at the point of interest. If we are to continue to use this metaphor as a way of understanding what happens when students are assessed or assess, then we need to agree on what the essential constructs are that underlie learning and achievement. Our current situation seems to be satisfied with everything is correlated and everything matters. It may be that data science will help us sort through the chaff for the wheat provided we design and implement sensors appropriate to the constructs we consider hypothetically most important. It may be that measuring timing of mouse clicks and eye tracking do connect to important underlying mechanisms, but at this stage data science in testing seems largely a case of crunch the ‘easy to get’ numbers and hope that the data mean something.

To address this concern, we need to develop for education’s sake, assessments that have strong alignment with curricular ambitions and values and which have applicability to classroom contexts and processes ( Bennett, 2018 ). This will mean technology that supports what humans must do in schooling rather than replace them with teaching/testing machines. Fortunately, some examples of assessment technology for learning do exist. One supportive technology is PeerWise ( Denny et al., 2008 ; Hancock et al., 2018 ) in which students create course related multiple-choice questions and use them as a self-testing learning strategy. A school-based technology is the e-asTTle computer assessment system that produces a suite of diagnostic reports to support teachers’ planning and teaching in response to what the system indicated students need to be taught ( Hattie and Brown, 2008 ; Brown and Hattie, 2012 ; Brown et al., 2018 ). What these technologies do is support rather than supplant the work that teachers and learners need to do to know what they need to study or teach and to monitor their progress. Most importantly they are well-connected to what students must learn and what teachers are teaching. Other detailed work uses organised learning models or dynamic learning maps to mark out routes for learners and teachers using cognitive and curriculum insights with psychometric tools for measuring status and progress ( Kingston et al., 2022 ). The work done by Wise (2019) shows that it is possible in a computer assisted testing environment to monitor student effort based on their speed of responding and give prompts that support greater effort and less speed.

Assessment needs to exploit more deeply the insights educational psychology has given us into human behavior, attitudes, inter- and intra-personal relations, emotions, and so on. This was called for some 20 years ago ( National Research Council, 2001 ) but the underlying disciplines that inform this integration seem to have grown away from each other. Nonetheless, the examples given above suggest that the gaps can be closed. But assessments still do not seem to consider and respond to these psychological determinants of achievement. Teachers have the capability of integrating curriculum, testing, psychology, and data at a superficial level but with some considerable margin of error ( Meissel et al., 2017 ). To overcome their own error, teachers need technologies that support them in making useful and accurate interpretations of what students need to be taught next that work with them in the classroom. As Bennett (2018) pointed out more technology will happen, but perhaps not more tests on computers. This is the assessment that will help teachers rather than replace them and give us hope for a better future.

Author contributions

GB wrote this manuscript and is solely responsible for its content.

Support for the publication of this paper was received from the Publishing and Scholarly Services of the Umeå University Library.

Acknowledgments

A previous version of this paper was presented as a keynote address to the 2019 biennial meeting of the European Association for Research in Learning and Instruction, with the title Products, Processes, Psychology, and Technology: Quo Vadis Educational Assessment ?

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

1. ^ https://www.hkeaa.edu.hk/en/sa_tsa/tsa/

2. ^ https://www.nationsreportcard.gov/

3. ^ https://nap.edu.au/

4. ^ https://www.nationsreportcard.gov/process_data/

Asimov, I. (1954). Oh the fun they had. Fantasy Sci. Fiction 6, 125–127.

Google Scholar

Assessment Reform Group (2002). Assessment for learning: 10 principles Research-based Principles to Guide Classroom Practice Cambridge: Assessment Reform Group.

Autoimmunity Research Foundation. (2012). Differences between in vitro, in vivo, and in silico studies [online]. The Marshall Protocol Knowledge Base. Available at: http://mpkb.org/home/patients/assessing_literature/in_vitro_studies (Accessed November 12, 2015).

Bennett, R. E. (2018). Educational assessment: what to watch in a rapidly changing world. Educ. Meas. Issues Pract. 37, 7–15. doi: 10.1111/emip.12231

CrossRef Full Text | Google Scholar

Biggs, J., Kember, D., and Leung, D. Y. (2001). The revised two-factor study process questionnaire: R-SPQ-2F. Br. J. Educ. Psychol. 71, 133–149. doi: 10.1348/000709901158433

PubMed Abstract | CrossRef Full Text | Google Scholar

Bloom, B. S. (1976). Human Characteristics and School Learning . New York: McGraw-Hill.

Bloom, B., Hastings, J., and Madaus, G. (1971). Handbook on Formative and Summative Evaluation of Student Learning . New York:McGraw Hill.

Boekaerts, M., and Niemivirta, M. (2000). “Self-regulated learning: finding a balance between learning goals and ego-protective goals,” in Handbook of Self-regulation . eds. M. Boekaerts, P. R. Pintrich, and M. Zeidner (San Diego, CA: Academic Press).

Bourdieu, P. (1974). “The school as a conservative force: scholastic and cultural inequalities,” in Contemporary Research in the Sociology of Education . ed. J. Eggleston (London: Methuen).

Brown, G. T. L. (2008). Conceptions of Assessment: Understanding what Assessment Means to Teachers and Students . New York: Nova Science Publishers.

Brown, G. T. L. (2011). Self-regulation of assessment beliefs and attitudes: a review of the Students' conceptions of assessment inventory. Educ. Psychol. 31, 731–748. doi: 10.1080/01443410.2011.599836

Brown, G. T. L. (2013). “Assessing assessment for learning: reconsidering the policy and practice,” in Making a Difference in Education and Social Policy . eds. M. East and S. May (Auckland, NZ: Pearson).

Brown, G. T. L. (2019). Is assessment for learning really assessment? Front. Educ. 4:64. doi: 10.3389/feduc.2019.00064

Brown, G. T. L. (2020a). Responding to assessment for learning: a pedagogical method, not assessment. N. Z. Annu. Rev. Educ. 26, 18–28. doi: 10.26686/nzaroe.v26.6854

Brown, G. T. L. (2020b). Schooling beyond COVID-19: an unevenly distributed future. Front. Educ. 5:82. doi: 10.3389/feduc.2020.00082

Brown, G. T. L., and Harris, L. R. (2009). Unintended consequences of using tests to improve learning: how improvement-oriented resources heighten conceptions of assessment as school accountability. J. MultiDisciplinary Eval. 6, 68–91.

Brown, G. T. L., and Harris, L. R. (2014). The future of self-assessment in classroom practice: reframing self-assessment as a core competency. Frontline Learn. Res. 3, 22–30. doi: 10.14786/flr.v2i1.24

Brown, G. T. L., O'leary, T. M., and Hattie, J. A. C. (2018). “Effective reporting for formative assessment: the asTTle case example,” in Score Reporting: Research and Applications . ed. D. Zapata-Rivera (New York: Routledge).

Brown, G. T., and Hattie, J. (2012). “The benefits of regular standardized assessment in childhood education: guiding improved instruction and learning,” in Contemporary Educational Debates in Childhood Education and Development . eds. S. Suggate and E. Reese (New York: Routledge).

Buckendahl, C. W. (2016). “Public perceptions about assessment in education,” in Handbook of Human and Social Conditions in Assessment . eds. G. T. L. Brown and L. R. Harris (New York: Routledge).

China Civilisation Centre (2007). China: Five Thousand Years of History and Civilization . Hong Kong: City University of Hong Kong Press.

Clauser, B. E. (2022). “A history of classical test theory,” in The History of Educational Measurement: Key Advancements in Theory, Policy, and Practice . eds. B. E. Clauser and M. B. Bunch (New York: Routledge).

Cole, M., Gay, J., Glick, J., and Sharp, D. (1971). The Cultural Context of Learning and Thinking: An Exploration in Experimental Anthropology . New York: Basic Books.

Croft, M., and Beard, J. J. (2022). “Development and evolution of the SAT and ACT,” in The History of Educational Measurement: Key Advancements in Theory, Policy, and Practice . eds. B. E. Clauser and M. B. Bunch (New York: Routledge).

Cronbach, L. J. (1954). Report on a psychometric mission to Clinicia. Psychometrika 19, 263–270. doi: 10.1007/BF02289226

Dawson, P. (2021). Defending Assessment Security in a Digital World: Preventing e-cheating and Supporting Academic Integrity in Higher Education . London: Routledge.

Deci, E. L., and Ryan, R. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. Am. Psychol. 55, 68–78.

Denny, P., Hamer, J., Luxton-Reilly, A., and Purchase, H. PeerWise: students sharing their multiple choice questions. ICER '08: Proceedings of the Fourth international Workshop on Computing Education Research; September6–7 (2008). Sydney, Australia, 51-58.

Duckworth, A. L., Quinn, P. D., and Tsukayama, E. (2011). What no child left behind leaves behind: the roles of IQ and self-control in predicting standardized achievement test scores and report card grades. J. Educ. Psychol. 104, 439–451. doi: 10.1037/a0026280

Elman, B. A. (2013). Civil Examinations and Meritocracy in Late IMPERIAL China . Cambridge: Harvard University Press.

Embretson, S. E., and Reise, S. P. (2000). Item Response Theory for Psychologists . Mahwah: LEA.

Encyclopedia Brittanica (2010a). Europe in the middle ages: the background of early Christian education. Encyclopedia Britannica.

Encyclopedia Brittanica (2010b). Western education in the 19th century. Encyclopedia Britannica.

Feng, Y. (1995). From the imperial examination to the national college entrance examination: the dynamics of political centralism in China's educational enterprise. J. Contemp. China 4, 28–56. doi: 10.1080/10670569508724213

Gierl, M. J., and Lai, H. (2016). A process for reviewing and evaluating generated test items. Educ. Meas. Issues Pract. 35, 6–20. doi: 10.1111/emip.12129

Gipps, C. V. (1994). Beyond Testing: Towards a Theory of Educational Assessment . London: Falmer Press.

Greiff, S., Wüstenberg, S., and Avvisati, F. (2015). Computer-generated log-file analyses as a window into students' minds? A showcase study based on the PISA 2012 assessment of problem solving. Comput. Educ. 91, 92–105. doi: 10.1016/j.compedu.2015.10.018

Hancock, D., Hare, N., Denny, P., and Denyer, G. (2018). Improving large class performance and engagement through student-generated question banks. Biochem. Mol. Biol. Educ. 46, 306–317. doi: 10.1002/bmb.21119

Harris, L. R., and Brown, G. T. L. (2016). “Assessment and parents,” in Encyclopedia of Educational Philosophy And theory . ed. M. A. Peters (Springer: Singapore).

Hattie, J. Models of self-concept that are neither top-down or bottom-up: the rope model of self-concept. 3rd International Biennial Self Research Conference; July, (2004). Berlin, DE.

Hattie, J. A., and Brown, G. T. L. (2008). Technology for school-based assessment and assessment for learning: development principles from New Zealand. J. Educ. Technol. Syst. 36, 189–201. doi: 10.2190/ET.36.2.g

Inhelder, B., and Piaget, J. (1958). The Growth of Logical Thinking from Childhood to Adolescence . New York; Basic Books

Kingston, N. M., Alonzo, A. C., Long, H., and Swinburne Romine, R. (2022). Editorial: the use of organized learning models in assessment. Front. Education 7:446. doi: 10.3389/feduc.2022.1009446

Kline, R. B. (2020). “Psychometrics,” in SAGE Research Methods Foundations . eds. P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, and R. A. Williams (London: Sage).

Linden, W. J. V. D., and Glas, G. A. W. (2000). Computerized Adaptive Testing: Theory and Practice . London: Kluwer Academic Publishers.

Lingard, B., and Lewis, S. (2016). “Globalization of the Anglo-American approach to top-down, test-based educational accountability,” in Handbook of Human and Social Conditions in Assessment . eds. G. T. L. Brown and L. R. Harris (New York: Routledge).

Lundgren, E., and Eklöf, H. (2020). Within-item response processes as indicators of test-taking effort and motivation. Educ. Res. Eval. 26, 275–301. doi: 10.1080/13803611.2021.1963940

Marsh, H. W., Hau, K.-T., Artelt, C., Baumert, J., and Peschar, J. L. (2006). OECD's brief self-report measure of educational psychology's most useful affective constructs: cross-cultural, psychometric comparisons across 25 countries. Int. J. Test. 6, 311–360. doi: 10.1207/s15327574ijt0604_1

Mcmillan, J. H. (2016). “Section discussion: student perceptions of assessment,” in Handbook of Human and Social Conditions in Assessment . eds. G. T. L. Brown and L. R. Harris (New York: Routledge).

Meissel, K., Meyer, F., Yao, E. S., and Rubie-Davies, C. M. (2017). Subjectivity of teacher judgments: exploring student characteristics that influence teacher judgments of student ability. Teach. Teach. Educ. 65, 48–60. doi: 10.1016/j.tate.2017.02.021

Murdock, T. B., Stephens, J. M., and Groteweil, M. M. (2016). “Student dishonesty in the face of assessment: who, why, and what we can do about it,” in Handbook of Human and Social Conditions in assessment . eds. G. T. L. Brown and L. R. Harris (New York: Routledge).

National Research Council (2001). Knowing what students know: The science and design of educational assessment. The National Academies Press.

Nichols, S. L., and Harris, L. R. (2016). “Accountability assessment’s effects on teachers and schools,” in Handbook of human and Social Conditions in Assessment . eds. G. T. L. Brown and L. R. Harris (New York: Routledge).

Panadero, E. (2016). “Is it safe? Social, interpersonal, and human effects of peer assessment: a review and future directions,” in Handbook of Human and Social Conditions in Assessment . eds. G. T. L. Brown and L. R. Harris (New York: Routledge).

Pearl, J., and Mackenzie, D. (2018). The Book of why: The New Science of Cause and Effect . New York: Hachette Book Group.

Resnick, L. B., and Resnick, D. P. (1989). Assessing the Thinking Curriculum: New Tools for Educational Reform . Washington, DC: National Commission on Testing and Public Policy.

Rogoff, B. (1991). “The joint socialization of development by young children and adults,” in Learning to Think: Child Development in Social Context 2 . eds. P. Light, S. Sheldon, and M. Woodhead (London: Routledge).

Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instr. Sci. 18, 119–144. doi: 10.1007/BF00117714

Scriven, M. (1967). “The methodology of evaluation,” in Perspectives of Curriculum Evaluation . eds. R. W. Tyler, R. M. Gagne, and M. Scriven (Chicago, IL: Rand McNally).

Shin, J., Guo, Q., and Gierl, M. J. (2021). “Automated essay scoring using deep learning algorithms,” in Handbook of Research on Modern Educational Technologies, Applications, and Management . ed. D. B. A. M. Khosrow-Pour (Hershey, PA, USA: IGI Global).

Stobart, G. (2005). Fairness in multicultural assessment systems. Assess. Educ. Principles Policy Pract. 12, 275–287. doi: 10.1080/09695940500337249

Stobart, G. (2006). “The validity of formative assessment,” in Assessment and Learning . ed. J. Gardner (London: Sage).

Teltemann, J., and Klieme, E. (2016). “The impact of international testing projects on policy and practice,” in Handbook of Human and Social Conditions in Assessment . eds. G. T. L. Brown and L. R. Harris (New York: Routledge).

Thorndike, E. L. (1924). Measurement of intelligence. Psychol. Rev. 31, 219–252. doi: 10.1037/h0073975

Von Davier, A. A., Deonovic, B., Yudelson, M., Polyak, S. T., and Woo, A. (2019). Computational psychometrics approach to holistic learning and assessment systems. Front. Educ. 4:69. doi: 10.3389/feduc.2019.00069

Vygotsky, L. S. (1978). Mind in Society: The Development of Higher Psychological Processes . Cambridge, MA:Harvard University Press.

Weiner, B. (1985). An Attributional theory of achievement motivation and emotion. Psychol. Rev. 92, 548–573. doi: 10.1037/0033-295X.92.4.548

Wherrett, S. (2004). The SATS story. The Guardian, 24 August.

Wise, S. L. (2019). Controlling construct-irrelevant factors through computer-based testing: disengagement, anxiety, & cheating. Educ. Inq. 10, 21–33. doi: 10.1080/20004508.2018.1490127

Wise, S. L., and Demars, C. E. (2005). Low examinee effort in low-stakes assessment: problems and potential solutions. Educ. Assess. 10, 1–17. doi: 10.1207/s15326977ea1001_1

Wise, S. L., and Smith, L. F. (2016). “The validity of assessment when students don’t give good effort,” in Handbook of Human and Social Conditions in Assessment . eds. G. T. L. Brown and L. R. Harris (New York: Routledge).

Yan, Z., and Brown, G. T. L. (2017). A cyclical self-assessment process: towards a model of how students engage in self-assessment. Assess. Eval. High. Educ. 42, 1247–1262. doi: 10.1080/02602938.2016.1260091

Zhao, A., Brown, G. T. L., and Meissel, K. (2020). Manipulating the consequences of tests: how Shanghai teens react to different consequences. Educ. Res. Eval. 26, 221–251. doi: 10.1080/13803611.2021.1963938

Zhao, A., Brown, G. T. L., and Meissel, K. (2022). New Zealand students’ test-taking motivation: an experimental study examining the effects of stakes. Assess. Educ. 29, 1–25. doi: 10.1080/0969594X.2022.2101043

Zumbo, B. D. (2015). Consequences, side effects and the ecology of testing: keys to considering assessment in vivo. Plenary Address to the 2015 Annual Conference of the Association for Educational Assessment—Europe (AEA-E). Glasgow, Scotland.

Zumbo, B. D., and Chan, E. K. H. (2014). Validity and Validation in Social, Behavioral, and Health Sciences . Cham, CH: Springer Press.

Zumbo, B. D., and Forer, B. (2011). “Testing and measurement from a multilevel view: psychometrics and validation,” in High Stakes Testing in Education-Science and Practice in K-12 Settings . eds. J. A. Bovaird, K. F. Geisinger, and C. W. Buckendahl (Washington: American Psychological Association Press).

Zumbo, B. D., Maddox, B., and Care, N. M. (in press). Process and product in computer-based assessments: clearing the ground for a holistic validity framework. Eur. J. Psychol. Assess.

Keywords: assessment, testing, technology, psychometrics, psychology, curriculum, classroom

Citation: Brown GTL (2022) The past, present and future of educational assessment: A transdisciplinary perspective. Front. Educ . 7:1060633. doi: 10.3389/feduc.2022.1060633

Received: 03 October 2022; Accepted: 25 October 2022; Published: 11 November 2022.

Reviewed by:

Copyright © 2022 Brown. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Gavin T. L. Brown, [email protected] ; [email protected]

This article is part of the Research Topic

Horizons in Education 2022

  • Skip to Content
  • Skip to Main Navigation
  • Skip to Search

definition of written test in education

Indiana University Bloomington Indiana University Bloomington IU Bloomington

Open Search

  • Course Development Institute
  • Programmatic Assessment
  • Instructional Technology
  • Class Observations and Feedback
  • Online Course Review and Feedback
  • New Faculty Programs
  • History of SoTL
  • SOTL Resources
  • IUB Database
  • Featured SoTL Activity
  • Intensive Writing
  • Faculty Liaison
  • Incorporating and Grading Writing
  • Writing Tutorial Services
  • Cel Conference
  • CEL Course Development Institute
  • ACE Program
  • Community Partners
  • CEL Course Designation
  • CEL during COVID-19
  • Annual AI Orientation
  • Annual Classroom Climate Workshop
  • GTAP Awardees
  • Graduate Student Learning Communities
  • Pedagogy Courses for Credit
  • Diversity Statements
  • Learning Communities
  • Active Learning
  • Frequent and Targeted Feedback
  • Spaced Practice
  • Transparency in Learning and Teaching
  • Faculty Spotlights
  • Preparing to Teach
  • Decoding the Disciplines
  • Backward Course Design
  • Developing Learning Outcomes
  • Syllabus Construction
  • How to Productively Address AI-Generated Text in Your Classroom 
  • Accurate Attendance & Participation with Tophat
  • Designing Assignments to Encourage Integrity
  • DEI and Student Evals
  • Engaging Students with Mental Health Issues
  • Inclusive and Equitable Syllabi
  • Creating Accessible Classrooms
  • Proctoring and Equity
  • Equitable Assignment Design
  • Making Teaching Transparent
  • DEIJ Institute
  • Sense of Belonging
  • Trauma-Informed Teaching
  • Managing Difficult Classroom Discussions
  • Technology to Support Equitable and Inclusive Teaching
  • Teaching during a Crisis
  • Teaching for Equity
  • Supporting Religious Observances
  • DEIJ Resources

Test Construction

  • Summative and Formative Assessment
  • Classroom Assessment Techniques
  • Authentic Assessment
  • Alternatives to Traditional Exams and Papers
  • Assessment for General Education and Programmatic Review
  • Rubric Creation and Use
  • Google Suite
  • Third Party Services: Legal, Privacy, and Instructional Concerns
  • eTexts and Unizin Engage
  • Next@IU Pilot Projects
  • Web Conferencing
  • Student Response Systems
  • Mid-Semester Evaluations
  • Teaching Statements & Philosophies
  • Peer Review of Teaching
  • Teaching Portfolios
  • Administering and Interpreting Course Evaluations
  • Temporary Online Teaching
  • Attendance Policies and Student Engagement
  • Teaching in the Face of Tragedy
  • Application for an Active Learning Classroom
  • Cedar Hall Classrooms
  • Reflection in Service Learning
  • Discussions
  • Incorporating Writing
  • Team-Based Learning
  • First Day Strategies
  • Flipping the Class
  • Holding Students Accountable
  • Producing Video for Courses
  • Effective Classroom Management
  • Games for Learning
  • Quick Guides
  • Mosaic Initiative
  • Kelley Office of Instructional Consulting and Assessment

Center for Innovative Teaching and Learning

  • Teaching Resources
  • Assessing Student Learning

Most tests are a form of summative assessment; that is, they measure students’ performance on a given task. (For more information on summative assessment, see the CITL resource on  formative and summative assessment .) McKeachie (2010) only half-jokes that “Unfortunately, it appears to be generally true that the examinations that are the easiest to construct are the most difficult to grade.” The inverse is also true: time spent constructing a clear exam will save time in the grading of it.

Closed-answer or “objective” tests

By “objective” this handbook refers to tests made up of multiple choice (or “multi-op”), matching, fill-in, true/false, fill-in-the-blank, or short-answer items as objective tests. Objective tests have the advantages of allowing an instructor to assess a large and potentially representative sample of course material and allow for reliable and efficient scoring. The disadvantages of objective tests include a tendency to emphasize only “recognition” skills, the ease with which correct answers can be guessed on many item types, and the inability to measure students’ organization and synthesis of material

Since the practical arguments for giving objective exams are compelling, we offer a few suggestions for writing multiple-choice items. The first is to find and adapt existing test items. Teachers’ manuals containing collections of items accompany many textbooks. However, the general rule is “adapt rather than adopt.” Existing items will rarely fit your specific needs; you should tailor them to more adequately reflect your objectives.

Objective-answer tests can be constructed to require students to apply concepts, or synthesize and analyze data and text. Consider using small “cases studies,” problems or situations. Provide a small collection of data, such as a description of a situation, a series of graphs, quotes, a paragraph, or any cluster of the kinds of raw information that might be appropriate material for the activities of your discipline. Then develop a series of questions based on that material, the answers to which require students to process and think through the material and question significantly before answering.

Here are a few additional guidelines to keep in mind when writing multiple-choice tests:

  • As much of the question as possible should be included in the stem.
  • Make sure there is only one clearly correct answer (unless you are instructing students to select more than one).
  • Make sure the correct answer is not given away by its being noticeably shorter, longer, or more complex than the distractors.
  • Make the wording in the response choices consistent with the item stem.
  • Beware of using answers such as “none of these” or “all of the above.”
  • Use negatives sparingly in the question or stem; do not use double negatives.
  • Beware of using sets of opposite answers unless more than one pair is presented (e.g., go to work, not go to work).

Essay exams

Conventional wisdom accurately portrays short-answer and essay examinations as the easiest to write and the most difficult to grade, particularly if they are graded well. You should give students an exam question for each crucial concept that they must understand.

If you want students to study in both depth and breadth, don't give them a choice among topics. This allows them to choose not to answer questions about those things they didn’t study. Instructors generally expect a great deal from students, but remember that their mastery of a subject depends as much on prior preparation and experience as it does on diligence and intelligence; even at the end of the semester some students will be struggling to understand the material. Design your questions so that all students can answer at their own levels.

The following are some suggestions that may enhance the quality of the essay tests that you produce

  • Have in mind the processes that you want measured (e.g., analysis, synthesis).
  • Start questions with words such as “compare,” “contrast,” “explain why.” Don’t use “what,” “when,” or “list.” (These latter types are better measured with objective-type items).
  • Write items that define the parameters of expected answers as clearly as possible.
  • Make sure that the essay question is specific enough to invite the level of detail you expect in the answer. A question such as “Discuss the causes of the American Civil War,” might get a wide range of answers, and therefore be impossible to grade reliably. A more controlled question would be, “Explain how the differing economic systems of the North and South contributed to the conflicts that led to the Civil War.
  • Design the question to prompt students’ organization of the answer. For example, a question like “Which three economic factors were most influential in the formation of the League of Nations?”
  • Don’t have too many questions for the time available.
  • For take-home exams, indicate whether or not students may collaborate and whether the help of a Writing Tutorial Services tutor is permissible.

Grading essay exams

A more detailed discussion of grading student work is offered in  evaluating student written work  and applies to handling essay exams as well.

However, unlike formal essays, essay exams are usually written in class under a time limit; they often fall at particularly busy times of the year like mid-term and finals week. Consequently, they are differently stressful for students, and as a result you may encounter errors and oversights that do not appear in formal essays. Similarly, it is not unusual to find essays that do not provide responses we have anticipated.

Your grading changes in response. Adjustments to the grading scale may be necessary in light of exam essays that provide answers you had not anticipated. Comments may be briefer, and focused primarily on the product students have produced; that is, exams do not require suggestions for revision.

Center for Innovative Teaching & Learning social media channels

Useful indiana university information.

  • Campus Policies
  • Knowledge Base
  • University Information Technology Services
  • Office of the Vice Provost for Undergraduate Education
  • Office of the Vice Provost for Faculty and Academic Affairs
  • Faculty Academy on Excellence in Teaching
  • Wells Library, 2nd Floor, East Tower 1320 East Tenth Street Bloomington, IN 47405
  • Phone: 812-855-9023
  • Skip to main content
  • Keyboard shortcuts for audio player

What Are Education Tests For, Anyway?

Anya Kamenetz

definition of written test in education

The all-too-familiar No. 2 pencil. Josh Davis/Flickr hide caption

Pay attention to this piece. There's going to be a test at the end.

Did that trigger scary memories of the 10th grade? Or are you just curious how you'll measure up?

If the answer is "C: Either of the above," keep reading.

Tests have existed throughout the history of education. Today they're being used more than ever before — but not necessarily as designed.

Different types of tests are best for different purposes. Some help students learn better. Some are there to sort individuals. Others help us understand how a whole population is doing.

But these types of tests are easily confused, and more easily misused. As the U.S. engages in another debate over how — and how much — we test kids, it might be helpful to do a little anatomy of assessment, or a taxonomy of tests.

Teachers divide tests into two big categories: formative and summative .

Formative assessment, aka formative feedback, is the name given to the steady little nudges that happen throughout the school day — when the teacher calls on someone, or sends a student up to the board to solve a problem, or pops a quiz to make sure you did the reading.

Any test given for purely diagnostic reasons can also be formative. Say a new student comes to school and teachers need to see what math class she should be in. What distinguishes formative assessments is that they're not there to judge you as a success or failure. The primary purpose is to guide both student and teacher.

U.S. Tests Teens A Lot, But Worldwide, Exam Stakes Are Higher

U.S. Tests Teens A Lot, But Worldwide, Exam Stakes Are Higher

As testing season opens in schools, some ask: how much is too much.

Nobody really argues against formative tests, so let's forget about them for now.

Summative assessment, on the other hand, sums up all your learning on one big day: the unit test, the research paper, the final exam, the exhibition.

When it comes to summative tests, U.S. schools really love a particular subcategory of them: psychometrically validated and standardized tests. Psychometrics literally means "mind measurement" — the discipline of test-making. It's a statistical pursuit, which means it's mostly math. Giant chunks of social science are based on the work of 19th century psychometricians, who came up with tools and concepts like correlation and regression analysis.

But the most famous of those tools is the bell curve. Almost any aspect of the human condition, when plotted on a graph, tends to assume this famous shape: crime, poverty, disease, marriage, suicide, weight, height, births, deaths. And, of course, when Alfred Binet developed the first widely used intelligence tests in the late 1800s, he made sure that the results conformed to that same bell curve.

Why does it matter that most of our tests are written by specialists in statistics? Well, when psychometricians write a test, they spend a lot of time ensuring standardization and reliability.

Reliability means if you give the same test to the same person on two different occasions, her scores should not be wildly different. And standardization means that, across a broad population, the results of the test will conform to an expected distribution — that bell curve, or something like it. If you give the same test to 20,000 people and they all score a 75, that's not a very useful test.

These rules are the reason that 4 million U.S. students are taking extra tests this year. Not for their own practice, but to test the tests themselves. These are the newly developed tests developed to align with the Common Core of State Standards. Large field tests are required to establish their standardization and reliability.

A psychometric test is historically grounded, mathematically precise and suitable for ranking large human populations. But those strengths can also be weaknesses.

  • A reliable test doesn't change much from year to year. That can make them easier to coach.
  • The need for reliable scoring often drives designers to use multiple choice questions, to avoid ambiguity. But that format has a hard time measuring a whole range of crucial human abilities: creativity, problem-solving, communication, teamwork and leadership, to name a few.
  • The multiple-choice format and the need for predictability mean psychometric tests, whether a state third-grade reading test or an SAT, all somewhat resemble each other. And so, they can end up testing a student's test-taking ability more than actual subject knowledge.

Reliability and standardization can be at odds with the third key problem in psychometrics: validity . That is, does this test actually tell us anything important? Especially, is it predictive of future performance in the real world? Validity ideally is established by comparing students' test scores with some sort of ground truth, such as grades in school, or later success in college. But that takes a long time and a lot of number crunching. And in practice the process is often pretty circular: the validity of test results tends to be based on their correlation with other test results.

So, those are the keys to how test makers see the test. But in the world of education, it's not just how they're written, but how they're used.

That brings us to the tests that so many Americans love to hate: high-stakes tests . The ones that decide whether our kids move up to the 4th grade, get a full-ride scholarship, or someday, a job.

In practice, we accept one kind of high-stakes test: the standalone gatekeeper test. Everyone wants a pilot who passed her licensing exam, or a lawyer who passed the bar. We like transparent, objective standards, especially when it's other people who have to meet them.

No, it's the other kind of high-stakes test that draws the most ire: accountability tests . They get this name because they are given to judge the performance of schools, teachers and states, not just students. Accountability tests determine school reorganization and closure decisions, teacher evaluations and state funding.

So, got all that? Good. Now here's your essay question:

Under the federal No Child Left Behind law, passed in 2001, public school accountability has rested largely on the results of psychometrically validated and standardized , largely multiple-choice summative assessments covering math and English only, given annually in 3rd through 12th grades. Given what you've just read about the strengths and weaknesses of this test format, is it wise to attach so many consequences to their results? State the reasons for your response.

  • Grades 6-12
  • School Leaders

50 Fun Earth Day Crafts and Activities 🌎!

Formative, Summative, and More Types of Assessments in Education

All the best ways to evaluate learning before, during, and after it happens.

Collage of types of assessments in education, including formative and summative

When you hear the word assessment, do you automatically think “tests”? While it’s true that tests are one kind of assessment, they’re not the only way teachers evaluate student progress. Learn more about the types of assessments used in education, and find out how and when to use them.

Diagnostic Assessments

Formative assessments, summative assessments.

  • Criterion-Referenced, Ipsative, and Normative Assessments

What is assessment?

In simplest terms, assessment means gathering data to help understand progress and effectiveness. In education, we gather data about student learning in variety of ways, then use it to assess both their progress and the effectiveness of our teaching programs. This helps educators know what’s working well and where they need to make changes.

Chart showing three types of assessments: diagnostic, formative, and summative

There are three broad types of assessments: diagnostic, formative, and summative. These take place throughout the learning process, helping students and teachers gauge learning. Within those three broad categories, you’ll find other types of assessment, such as ipsative, norm-referenced, and criterion-referenced.

What’s the purpose of assessment in education?

In education, we can group assessments under three main purposes:

  • Of learning
  • For learning
  • As learning

Assessment of learning is student-based and one of the most familiar, encompassing tests, reports, essays, and other ways of determining what students have learned. These are usually summative assessments, and they are used to gauge progress for individuals and groups so educators can determine who has mastered the material and who needs more assistance.

When we talk about assessment for learning, we’re referring to the constant evaluations teachers perform as they teach. These quick assessments—such as in-class discussions or quick pop quizzes—give educators the chance to see if their teaching strategies are working. This allows them to make adjustments in action, tailoring their lessons and activities to student needs. Assessment for learning usually includes the formative and diagnostic types.

Assessment can also be a part of the learning process itself. When students use self-evaluations, flash cards, or rubrics, they’re using assessments to help them learn.

Let’s take a closer look at the various types of assessments used in education.

Worksheet in a red binder called Reconstruction Anticipation Guide, used as a diagnostic pre-assessment (Types of Assessment)

Diagnostic assessments are used before learning to determine what students already do and do not know. This often refers to pre-tests and other activities students attempt at the beginning of a unit.

How To Use Diagnostic Assessments

When giving diagnostic assessments, it’s important to remind students these won’t affect their overall grade. Instead, it’s a way for them to find out what they’ll be learning in an upcoming lesson or unit. It can also help them understand their own strengths and weaknesses, so they can ask for help when they need it.

Teachers can use results to understand what students already know and adapt their lesson plans accordingly. There’s no point in over-teaching a concept students have already mastered. On the other hand, a diagnostic assessment can also help highlight expected pre-knowledge that may be missing.

For instance, a teacher might assume students already know certain vocabulary words that are important for an upcoming lesson. If the diagnostic assessment indicates differently, the teacher knows they’ll need to take a step back and do a little pre-teaching before getting to their actual lesson plans.

Examples of Diagnostic Assessments

  • Pre-test: This includes the same questions (or types of questions) that will appear on a final test, and it’s an excellent way to compare results.
  • Blind Kahoot: Teachers and kids already love using Kahoot for test review, but it’s also the perfect way to introduce a new topic. Learn how Blind Kahoots work here.
  • Survey or questionnaire: Ask students to rate their knowledge on a topic with a series of low-stakes questions.
  • Checklist: Create a list of skills and knowledge students will build throughout a unit, and have them start by checking off any they already feel they’ve mastered. Revisit the list frequently as part of formative assessment.

What stuck with you today? chart with sticky note exit tickets, used as formative assessment

Formative assessments take place during instruction. They’re used throughout the learning process and help teachers make on-the-go adjustments to instruction and activities as needed. These assessments aren’t used in calculating student grades, but they are planned as part of a lesson or activity. Learn much more about formative assessments here.

How To Use Formative Assessments

As you’re building a lesson plan, be sure to include formative assessments at logical points. These types of assessments might be used at the end of a class period, after finishing a hands-on activity, or once you’re through with a unit section or learning objective.

Once you have the results, use that feedback to determine student progress, both overall and as individuals. If the majority of a class is struggling with a specific concept, you might need to find different ways to teach it. Or you might discover that one student is especially falling behind and arrange to offer extra assistance to help them out.

While kids may grumble, standard homework review assignments can actually be a pretty valuable type of formative assessment . They give kids a chance to practice, while teachers can evaluate their progress by checking the answers. Just remember that homework review assignments are only one type of formative assessment, and not all kids have access to a safe and dedicated learning space outside of school.

Examples of Formative Assessments

  • Exit tickets : At the end of a lesson or class, pose a question for students to answer before they leave. They can answer using a sticky note, online form, or digital tool.
  • Kahoot quizzes : Kids enjoy the gamified fun, while teachers appreciate the ability to analyze the data later to see which topics students understand well and which need more time.
  • Flip (formerly Flipgrid): We love Flip for helping teachers connect with students who hate speaking up in class. This innovative (and free!) tech tool lets students post selfie videos in response to teacher prompts. Kids can view each other’s videos, commenting and continuing the conversation in a low-key way.
  • Self-evaluation: Encourage students to use formative assessments to gauge their own progress too. If they struggle with review questions or example problems, they know they’ll need to spend more time studying. This way, they’re not surprised when they don’t do well on a more formal test.

Find a big list of 25 creative and effective formative assessment options here.

Summative assessment in the form of a

Summative assessments are used at the end of a unit or lesson to determine what students have learned. By comparing diagnostic and summative assessments, teachers and learners can get a clearer picture of how much progress they’ve made. Summative assessments are often tests or exams but also include options like essays, projects, and presentations.

How To Use Summative Assessments

The goal of a summative assessment is to find out what students have learned and if their learning matches the goals for a unit or activity. Ensure you match your test questions or assessment activities with specific learning objectives to make the best use of summative assessments.

When possible, use an array of summative assessment options to give all types of learners a chance to demonstrate their knowledge. For instance, some students suffer from severe test anxiety but may still have mastered the skills and concepts and just need another way to show their achievement. Consider ditching the test paper and having a conversation with the student about the topic instead, covering the same basic objectives but without the high-pressure test environment.

Summative assessments are often used for grades, but they’re really about so much more. Encourage students to revisit their tests and exams, finding the right answers to any they originally missed. Think about allowing retakes for those who show dedication to improving on their learning. Drive home the idea that learning is about more than just a grade on a report card.

Examples of Summative Assessments

  • Traditional tests: These might include multiple-choice, matching, and short-answer questions.
  • Essays and research papers: This is another traditional form of summative assessment, typically involving drafts (which are really formative assessments in disguise) and edits before a final copy.
  • Presentations: From oral book reports to persuasive speeches and beyond, presentations are another time-honored form of summative assessment.

Find 25 of our favorite alternative assessments here.

More Types of Assessments

Now that you know the three basic types of assessments, let’s take a look at some of the more specific and advanced terms you’re likely to hear in professional development books and sessions. These assessments may fit into some or all of the broader categories, depending on how they’re used. Here’s what teachers need to know.

Criterion-Referenced Assessments

In this common type of assessment, a student’s knowledge is compared to a standard learning objective. Most summative assessments are designed to measure student mastery of specific learning objectives. The important thing to remember about this type of assessment is that it only compares a student to the expected learning objectives themselves, not to other students.

Chart comparing normative and criterion referenced types of assessment

Many standardized tests are criterion-referenced assessments. A governing board determines the learning objectives for a specific group of students. Then, all students take a standardized test to see if they’ve achieved those objectives.

Find out more about criterion-referenced assessments here.

Norm-Referenced Assessments

These types of assessments do compare student achievement with that of their peers. Students receive a ranking based on their score and potentially on other factors as well. Norm-referenced assessments usually rank on a bell curve, establishing an “average” as well as high performers and low performers.

These assessments can be used as screening for those at risk for poor performance (such as those with learning disabilities) or to identify high-level learners who would thrive on additional challenges. They may also help rank students for college entrance or scholarships, or determine whether a student is ready for a new experience like preschool.

Learn more about norm-referenced assessments here.

Ipsative Assessments

In education, ipsative assessments compare a learner’s present performance to their own past performance, to chart achievement over time. Many educators consider ipsative assessment to be the most important of all , since it helps students and parents truly understand what they’ve accomplished—and sometimes, what they haven’t. It’s all about measuring personal growth.

Comparing the results of pre-tests with final exams is one type of ipsative assessment. Some schools use curriculum-based measurement to track ipsative performance. Kids take regular quick assessments (often weekly) to show their current skill/knowledge level in reading, writing, math, and other basics. Their results are charted, showing their progress over time.

Learn more about ipsative assessment in education here.

Have more questions about the best types of assessments to use with your students? Come ask for advice in the We Are Teachers HELPLINE group on Facebook.

Plus, check out creative ways to check for understanding ..

Learn about the basic types of assessments educators use in and out of the classroom, and how to use them most effectively with students.

You Might Also Like

What is Formative Assessment? #buzzwordsexplained

What Is Formative Assessment and How Should Teachers Use It?

Check student progress as they learn, and adapt to their needs. Continue Reading

Copyright © 2023. All rights reserved. 5335 Gate Parkway, Jacksonville, FL 32256

Created by the Great Schools Partnership , the GLOSSARY OF EDUCATION REFORM is a comprehensive online resource that describes widely used school-improvement terms, concepts, and strategies for journalists, parents, and community members. | Learn more »

Share

Summative Assessment

Summative assessments are used to evaluate student learning, skill acquisition, and academic achievement at the conclusion of a defined instructional period—typically at the end of a project, unit, course, semester, program, or school year. Generally speaking, summative assessments are defined by three major criteria:

  • The tests, assignments, or projects are used to determine whether students have learned what they were expected to learn. In other words, what makes an assessment “summative” is not the design of the test, assignment, or self-evaluation, per se, but the way it is used—i.e., to determine whether and to what degree students have learned the material they have been taught.
  • Summative assessments are given at the conclusion of a specific instructional period, and therefore they are generally evaluative, rather than diagnostic—i.e., they are more appropriately used to determine learning progress and achievement, evaluate the effectiveness of educational programs, measure progress toward improvement goals, or make course-placement decisions, among other possible applications.
  • Summative-assessment results are often recorded as scores or grades that are then factored into a student’s permanent academic record, whether they end up as letter grades on a report card or test scores used in the college-admissions process. While summative assessments are typically a major component of the grading process in most districts, schools, and courses, not all assessments considered to be summative are graded.
Summative assessments are commonly contrasted with formative assessments , which collect detailed information that educators can use to improve instruction and student learning while it’s happening. In other words, formative assessments are often said to be for learning, while summative assessments are of learning. Or as assessment expert Paul Black put it, “When the cook tastes the soup, that’s formative assessment. When the customer tastes the soup, that’s summative assessment.” It should be noted, however, that the distinction between formative and summative is often fuzzy in practice, and educators may have divergent interpretations and opinions on the subject.

Some of the most well-known and widely discussed examples of summative assessments are the standardized tests administered by states and testing organizations, usually in math, reading, writing, and science. Other examples of summative assessments include:

  • End-of-unit or chapter tests.
  • End-of-term or semester tests.
  • Standardized tests that are used to for the purposes of school accountability, college admissions (e.g., the SAT or ACT), or end-of-course evaluation (e.g., Advanced Placement or International Baccalaureate exams).
  • Culminating demonstrations of learning or other forms of “performance assessment,” such as portfolios of student work that are collected over time and evaluated by teachers or capstone projects that students work on over extended periods of time and that they present and defend at the conclusion of a school year or their high school education.

While most summative assessments are given at the conclusion of an instructional period, some summative assessments can still be used diagnostically. For example, the growing availability of student data, made possible by online grading systems and databases, can give teachers access to assessment results from previous years or other courses. By reviewing this data, teachers may be able to identify students more likely to struggle academically in certain subject areas or with certain concepts. In addition, students may be allowed to take some summative tests multiple times, and teachers might use the results to help prepare students for future administrations of the test.

It should also be noted that districts and schools may use “interim” or “benchmark” tests to monitor the academic progress of students and determine whether they are on track to mastering the material that will be evaluated on end-of-course tests or standardized tests. Some educators consider interim tests to be formative, since they are often used diagnostically to inform instructional modifications, but others may consider them to be summative. There is ongoing debate in the education community about this distinction, and interim assessments may defined differently from place to place. See  formative assessment  for a more detailed discussion.

While educators have arguably been using “summative assessments” in various forms since the invention of schools and teaching, summative assessments have in recent decades become components of larger school-improvement efforts. As they always have, summative assessments can help teachers determine whether students are making adequate academic progress or meeting expected learning standards, and results may be used to inform modifications to instructional techniques, lesson designs, or teaching materials the next time a course, unit, or lesson is taught. Yet perhaps the biggest changes in the use of summative assessments have resulted from state and federal policies aimed at improving public education—specifically, standardized high-stakes tests used to make important decisions about schools, teachers, and students.

While there is little disagreement among educators about the need for or utility of summative assessments, debates and disagreements tend to center on issues of fairness and effectiveness, especially when summative-assessment results are used for high-stakes purposes. In these cases, educators, experts, reformers, policy makers, and others may debate whether assessments are being designed and used appropriately, or whether high-stakes tests are either beneficial or harmful to the educational process. For more detailed discussions of these issues, see high-stakes test , measurement error , test accommodations , test bias , score inflation , standardized test , and value-added measures .

Creative Commons License

Alphabetical Search

  • Our Mission

An illustration of large scale pencils approaching a standardized test

What Does the Research Say About Testing?

There’s too much testing in schools, most teachers agree, but well-designed classroom tests and quizzes can improve student recall and retention.

For many teachers, the image of students sitting in silence filling out bubbles, computing mathematical equations, or writing timed essays causes an intensely negative reaction.

Since the passage of the No Child Left Behind Act (NCLB) in 2002 and its 2015 update, the Every Student Succeeds Act (ESSA), every third through eighth grader in U.S. public schools now takes tests calibrated to state standards, with the aggregate results made public. In a study of the nation’s largest urban school districts , students took an average of 112 standardized tests between pre-K and grade 12.

This annual testing ritual can take time from genuine learning, say many educators , and puts pressure on the least advantaged districts to focus on test prep—not to mention adding airless, stultifying hours of proctoring to teachers’ lives. “Tests don’t explicitly teach anything. Teachers do,” writes Jose Vilson , a middle school math teacher in New York City. Instead of standardized tests, students “should have tests created by teachers with the goal of learning more about the students’ abilities and interests,” echoes Meena Negandhi, math coordinator at the French American Academy in Jersey City, New Jersey.

The pushback on high-stakes testing has also accelerated a national conversation about how students truly learn and retain information. Over the past decade and a half, educators have been moving away from traditional testing —particularly multiple choice tests—and turning to hands-on projects and competency-based assessments that focus on goals such as critical thinking and mastery rather than rote memorization.

But educators shouldn’t give up on traditional classroom tests so quickly. Research has found that tests can be valuable tools to help students learn , if designed and administered with format, timing, and content in mind—and a clear purpose to improve student learning.

Not All Tests Are Bad

One of the most useful kinds of tests are the least time-consuming: quick, easy practice quizzes on recently taught content. Tests can be especially beneficial if they are given frequently and provide near-immediate feedback to help students improve. This retrieval practice can be as simple as asking students to write down two to four facts from the prior day or giving them a brief quiz on a previous class lesson.

Retrieval practice works because it helps students retain information in a better way than simply studying material, according to research . While reviewing concepts can help students become more familiar with a topic, information is quickly forgotten without more active learning strategies like frequent practice quizzes.

But to reduce anxiety and stereotype threat—the fear of conforming to a negative stereotype about a group that one belongs to—retrieval-type practice tests also need to be low-stakes (with minor to no grades) and administered up to three times before a final summative effort to be most effective.

Timing also matters. Students are able to do fine on high-stakes assessment tests if they take them shortly after they study. But a week or more after studying, students retain much less information and will do much worse on major assessments—especially if they’ve had no practice tests in between.

A 2006 study found that students who had brief retrieval tests before a high-stakes test remembered 60 percent of material, while those who only studied remembered 40 percent. Additionally, in a 2009 study , eighth graders who took a practice test halfway through the year remembered 10 percent more facts on a U.S. history final at the end of the year than peers who studied but took no practice test.

Short, low-stakes tests also help teachers gauge how well students understand the material and what they need to reteach. This is effective when tests are formative —that is, designed for immediate feedback so that students and teachers can see students’ areas of strength and weakness and address areas for growth. Summative tests, such as a final exam that measures how much was learned but offers no opportunities for a student to improve, have been found to be less effective.

Testing Format Matters

Teachers should tread carefully with test design, however, as not all tests help students retain information. Though multiple choice tests are relatively easy to create, they can contain misleading answer choices—that are either ambiguous or vague—or offer the infamous all-, some-, or none-of-the-above choices, which tend to encourage guessing.

A student takes a standardized test.

While educators often rely on open-ended questions, such short-answer questions, because they seem to offer a genuine window into student thinking, research shows that there is no difference between multiple choice and constructed response questions in terms of demonstrating what students have learned.

In the end, well-constructed multiple choice tests , with clear questions and plausible answers (and no all- or none-of-the-above choices), can be a useful way to assess students’ understanding of material, particularly if the answers are quickly reviewed by the teacher.

All students do not do equally well on multiple choice tests, however. Girls tend to do less well than boys and perform better on questions with open-ended answers , according to a 2018 study by Stanford University’s Sean Reardon, which found that test format alone accounts for 25 percent of the gender difference in performance in both reading and math. Researchers hypothesize that one explanation for the gender difference on high-stakes tests is risk aversion, meaning girls tend to guess less .

Giving more time for fewer, more complex or richer testing questions can also increase performance, in part because it reduces anxiety. Research shows that simply introducing a time limit on a test can cause students to experience stress, so instead of emphasizing speed, teachers should encourage students to think deeply about the problems they’re solving.

Setting the Right Testing Conditions

Test achievement often reflects outside conditions, and how students do on tests can be shifted substantially by comments they hear and what they receive as feedback from teachers.

When teachers tell disadvantaged high school students that an upcoming assessment may be a challenge and that challenge helps the brain grow, students persist more, leading to higher grades, according to 2015 research from Stanford professor David Paunesku. Conversely, simply saying that some students are good at a task without including a growth-mindset message or the explanation that it’s because they are smart harms children’s performance —even when the task is as simple as drawing shapes.

Also harmful to student motivation are data walls displaying student scores or assessments. While data walls might be useful for educators, a 2014 study found that displaying them in classrooms led students to compare status rather than improve work.

The most positive impact on testing comes from peer or instructor comments that give the student the ability to revise or correct. For example, questions like , “Can you tell me more about what you mean?” or “Can you find evidence for that?” can encourage students to improve  engagement with their work. Perhaps not surprisingly, students do well when given multiple chances to learn and improve—and when they’re encouraged to believe that they can.

Your browser is unsupported

We recommend using the latest version of IE11, Edge, Chrome, Firefox or Safari.

Center for the Advancement of Teaching Excellence

Summative assessments.

Nicole Messier, CATE Instructional Designer February 7th, 2022

WHAT? Heading link Copy link

Summative assessments are used to measure learning when instruction is over and thus may occur at the end of a learning unit, module, or the entire course.

Summative assessments are usually graded, are weighted more heavily than other course assignments or comprise a substantial percentage of a students’ overall grade (and are often considered “high stakes” assessments relative to other, “lower stakes” assessments in a course), and are required assessments for the completion of a course.

Summative assessments can be viewed through two broad assessment strategies: assessments of learning and assessments as learning.

  • Assessment of learning (AoL) provides data to confirm course outcomes and students the opportunity to demonstrate proficiency in the learning objectives.
  • Assessment as learning (AaL) provides student ownership of learning by utilizing evidence-based learning strategies, promoting self-regulation, and providing reflective learning.

A summative assessment can be designed to provide both assessment of learning (AoL) and assessment as learning (AaL). The goal of designing for AaL and AoL is to create a summative assessment as a learning experience while ensuring that the data collected is valid and reliable.

Summative Assessment includes test taking

Want to learn more about these assessment strategies? Please visit the  Resources Section – CATE website to review resources, teaching guides, and more.

Summative Assessments Heading link Copy link

Summative assessments (aol).

  • Written assignments – such as papers or authentic assessments like projects or portfolios of creative work
  • Mid-term exam
  • Performances

Although exams are typically used to measure student knowledge and skills at the end of a learning unit, module, or an entire course, they can also be incorporated into learning opportunities for students.

Example 1 - Exam Heading link Copy link

Example 1 - exam.

An instructor decides to analyze their current multiple-choice and short-answer final exam for alignment to the learning objectives. The instructor discovers that the questions cover the content in the learning objectives; however, some questions are not at the same cognitive levels as the learning objectives . The instructor determines that they need to create some scenario questions where students are asked to analyze a situation and apply knowledge to be aligned with a particular learning objective.

The instructor also realizes that this new type of question format will be challenging for students if the exam is the only opportunity provided to students. The instructor decides to create a study guide for students on scenarios (not used in the exam) for students to practice and self-assess their learning. The instructor plans to make future changes to the quizzes and non-graded formative questions to include higher-level cognitive questions to ensure that learning objectives are being assessed as well as to support student success in the summative assessment.

This example demonstrates assessment of learning with an emphasis on improving the validity of the results, as well as assessment as learning by providing students with opportunities to self-assess and reflect on their learning.

Written assignments in any form (authentic, project, or problem-based) can also be designed to collect data and measure student learning, as well as provide opportunities for self-regulation and reflective learning. Instructors should consider using a type of grading rubric (analytic, holistic, or single point) for written assignments to ensure that the data collected is valid and reliable.

Summative Assessments (AaL) Heading link Copy link

Summative assessments (aal).

  • Authentic assessments – an assessment that involves a real-world task or application of knowledge instead of a traditional paper; could involve a situation or scenario specific to a future career.
  • Project-based learning – an assessment that involves student choice in designing and addressing a problem, need, or question.
  • Problem-based learning – similar to project-based learning but focused on solutions to problems.
  • Self-critique or peer assessment

Example 2 - Authentic Assessment Heading link Copy link

Example 2 - authentic assessment.

An instructor has traditionally used a research paper as the final summative assessment in their course. After attending a conference session on authentic assessments, the instructor decides to change this summative assessment to an authentic assessment that allows for student choice and increased interaction, feedback, and ownership.

First, the instructor introduced the summative project during the first week of class. The summative project instructions asked students to select a problem that could be addressed by one of the themes from the course. Students were provided with a list of authentic products that they could choose from, or they could request permission to submit a different product. Students were also provided with a rubric aligned to the learning objectives.

Next, the instructor created small groups (three to four students) with discussion forums for students to begin brainstorming problems, themes, and ideas for their summative project. These groups were also required to use the rubric to provide feedback to their peers at two separate time points in the course. Students were required to submit their final product, references, self-assessment using the rubric, and a reflection on the peer interaction and review.

This example demonstrates an authentic assessment as well as an assessment of learning (AoL) and assessment as learning (AaL). The validity and reliability of this summative assessment are ensured using a rubric that is focused on the learning objectives of the course and consistently utilized for the grading and feedback of the summative project. Data collected from the use of grading criteria in a rubric can be used to improve the summative project as well as the instruction and materials in the course. This summative project allows for reflective learning and provides opportunities for students to develop self-regulation skills as well as apply knowledge gained in an authentic and meaningful product.

Another way to create a summative assessment as a learning opportunity is to break it down into smaller manageable parts. These smaller parts will guide students’ understanding of expectations, provide them with opportunities to receive and apply feedback, as well as support their executive functioning and self-regulation skills.

WHY? Heading link Copy link

We know that summative assessments are vital to the curriculum planning cycle to measure student outcomes and implement continuous improvements. But how do we ensure our summative assessments are effective and equitable? Well, the answer is in the research.

Validity, Reliability, and Manageability

Critical components for the effectiveness of summative assessments are the validity, reliability, and manageability of the assessment (Khaled, 2020).

  • Validity of the assessment refers to the alignment to course learning objectives. In other words, are the assessments in your course measuring the learning objectives?
  • Reliability of the assessment refers to the consistency or accuracy of the assessment used. Are the assessment practices consistent from student to student and semester to semester?
  • Manageability of the assessment refers to the workload for both faculty and students. For faculty, is the type of summative assessment causing a delay in timely grading and feedback to the learner? For students, is the summative assessment attainable and are the expectations realistic?

As you begin to design a summative assessment, determine how you will ensure the assessment is valid, reliable, and manageable.

Feedback & Summative Assessments

Attributes of academic feedback that improve the impact of the summative assessment on student learning (Daka, 2021; Harrison 2017) include:

  • Provide feedback without or before grades.
  • Once the grade is given, then explain the grading criteria and score (e.g., using a rubric to explain grading criteria and scoring).
  •  Identify specific qualities in students’ work.
  • Describe actionable steps on what and how to improve.
  • Motivate and encourage students by providing opportunities to submit revisions or earn partial credit for submitting revised responses to incorrect answers on exams.
  • Allow students to monitor, evaluate, and regulate their learning.

Additional recommendations for feedback include that feedback should be timely, frequent, constructive (what and how), and should help infuse a sense of professional identity for students (why). The alignment of learning objectives, learning activities, and summative assessments is critical to student success and will ensure that assessments are valid. And lastly, the tasks in assessments should match the cognitive levels of the course learning objectives to challenge the highest performing students while elevating lower-achieving students (Daka, 2021).

HOW? Heading link Copy link

How do you start designing summative assessments?

Summative assessments can help measure student achievement of course learning objectives as well as provide the instructor with data to make pedagogical decisions on future teaching and instruction. Summative assessments can also provide learning opportunities as students reflect and take ownership of their learning.

So how do you determine what type of summative assessment to design? And how do you ensure that summative assessment will be valid, reliable, and manageable? Let’s dive into some of the elements that might impact your design decisions, including class size, discipline, modality, and EdTech tools .

Class Size and Modality

The manageability of summative assessments can be impacted by the class size and modality of the course. Depending on the class size of the course, instructors might be able to implement more opportunities for authentic summative assessments that provide student ownership and allow for more reflective learning (students think about their learning and make connections to their experiences). Larger class sizes might require instructors to consider implementing an EdTech tool to improve the manageability of summative assessments.

The course modality can also influence the design decisions of summative assessments. Courses with synchronous class sessions can require students to take summative assessments simultaneously through an in-person paper exam or an online exam using an EdTech tool, like Gradescope or Blackboard Tests, Pools, and Surveys . Courses can also create opportunities for students to share their authentic assessments asynchronously using an EdTech tool like VoiceThread .

Major Coursework

When designing a summative assessment as a learning opportunity for major coursework, instructors should reflect on the learning objectives to be assessed and the possible real-world application of the learning objectives. In replacement of multiple-choice or short answer questions that focus on content memorization, instructors might consider creating scenarios or situational questions that provide students with opportunities to analyze and apply knowledge gained. In major coursework, instructors should consider authentic assessments that allow for student choice, transfer of knowledge, and the development of professional skills in place of a traditional paper or essay.

Undergraduate General Education Coursework

In undergraduate general education coursework, instructors should consider the use of authentic assessments to make connections to students’ experiences, goals, and future careers. Simple adjustments to assignment instructions to allow for student choice can help increase student engagement and motivation. Designing authentic summative assessments can help connect students to the real-world application of the content and create buy-in on the importance of the summative assessment.

Summative Assessment Tools

EdTech tools can help to reduce faculty workload by providing a delivery system for students to submit work as well as tools to support academic integrity.

Below are EdTech tools that are available to UIC faculty to create and/or grade summative assessments as and of learning.

Assessment Creation and Grading Tools Heading link Copy link

Assessment creation and grading tools.

  • Blackboard assignments drop box and rubrics
  • Blackboard quizzes and exams

Assessment creation and grading tools can help support instructors in designing valid and reliable summative assessments. Gradescope can be utilized as a grading tool for in-person paper and pencil midterm and final exams, as well as a tool to create digital summative assessments. Instructors can use AI to improve the manageability of summative assessments as well as the reliability through the use of rubrics for grading with Gradescope.

In the Blackboard learning management system, instructors can create pools of questions for both formative and summative assessments as well as create authentic assessment drop boxes and rubrics aligned to learning objectives for valid and reliable data collection.

Academic Integrity Tools

  • SafeAssign (undergraduate)
  •   iThenticate (graduate)
  • Respondus LockDown Browser and Monitoring

Academic integrity tools can help ensure that students are meeting academic expectations concerning research through the use of SafeAssign and iThenticate as well as academic integrity during online tests and exams using Respondus Lockdown Browser and Monitoring.

Want to learn more about these summative assessment tools? Visit the EdTech section on the CATE website to learn more.

Exam Guidance

Additional guidance on online exams is available in Section III: Best Practices for Online (Remote Proctored, Synchronous) Exams in the Guidelines for Assessment in Online Environments Report , which outlines steps for equitable exam design, accessible exam technology, and effective communication for student success. The framing questions in the report are designed to guide instructors with suggestions, examples, and best practices (Academic Planning Task Force, 2020), which include:

  • “What steps should be taken to ensure that all students have the necessary hardware, software, and internet capabilities to complete a remote, proctored exam?
  • What practices should be implemented to make remote proctored exams accessible to all students, and in particular, for students with disabilities?
  • How can creating an ethos of academic integrity be leveraged to curb cheating in remote proctored exams?
  • What are exam design strategies to minimize cheating in an online environment?
  • What tools can help to disincentive cheating during a remote proctored exam?
  • How might feedback and grading strategies be adjusted to deter academic misconduct on exams?”

GETTING STARTED Heading link Copy link

Getting started.

The following steps will support you as you examine current summative assessment practices through the lens of assessment of learning (AoL) and assessment as learning (AaL) and develop new or adapt existing summative assessments.

  • The first step is to utilize backward design principles by aligning the summative assessments to the learning objectives.
  • To collect valid and reliable data to confirm student outcomes (AoL).
  • To promote self-regulation and reflective learning by students (AaL).
  • Format: exam, written assignment, portfolio, performance, project, etc.
  • Delivery: paper and pencil, Blackboard, EdTech tool, etc.
  • Feedback: general (how to improve performance), personalized (student-specific), etc.
  • Scoring: automatically graded by Blackboard and/or EdTech tool or manual through the use of a rubric in Blackboard.
  • The fourth step is to review data collected from summative assessment(s) and reflect on the implementation of the summative assessment(s) through the lens of validity, reliability, and manageability to inform continuous improvements for equitable student outcomes.

CITING THIS GUIDE Heading link Copy link

Citing this guide.

Messier, N. (2022). “Summative assessments.” Center for the Advancement of Teaching Excellence at the University of Illinois Chicago. Retrieved [today’s date] from https://teaching.uic.edu/resources/teaching-guides/assessment-grading-practices/summative-assessments/

ADDITIONAL RESOURCES Heading link Copy link

Academic Planning Task Force. (2020). Guidelines for Assessment in Online Learning Environments .

McLaughlin, L., Ricevuto, J. (2021). Assessments in a Virtual Environment: You Won’t Need that Lockdown Browser! Faculty Focus.

Moore, E. (2020). Assessments by Design: Rethinking Assessment for Learner Variability. Faculty Focus.

Websites and Journals

Association for the Assessment of Learning in Higher Education website 

Assessment & Evaluation in Higher Education. Taylor & Francis Online Journals

Journal of Assessment in Higher Education

REFERENCES Heading link Copy link

Daka, H., & Mulenga-Hagane, M., Mukalula-Kalumbi, M., Lisulo, S. (2021). Making summative assessment effective. 5. 224 – 237.

Earl, L.M., Katz, S. (2006). Rethinking classroom assessment with purpose in mind — Assessment for learning, assessment as learning, assessment of learning. Winnipeg, Manitoba: Crown in Right of Manitoba.

Galletly, R., Carciofo, R. (2020). Using an online discussion forum in a summative coursework assignment. Journal of Educators Online . Volume 17, Issue 2.

Harrison, C., Könings, K., Schuwirth, L. & Wass, V., Van der Vleuten, C. (2017). Changing the culture of assessment: the dominance of the summative assessment paradigm. BMC Medical Education. 17. 10.1186/s12909-017-0912-5.

Khaled, S., El Khatib, S. (2020). Summative assessment in higher education: Feedback for better learning outcomes

Your Article Library

Types of written tests | education.

definition of written test in education

ADVERTISEMENTS:

This article throws light upon the two types of written tests which are carried out to determine the performance of students. The types are: 1. Objective Type Tests 2. Essay Type Tests. 

Type # 1. Objective Type Tests:

Objective type test items are highly structured test items. It requires the pupils to supply a word or two or to select the correct answer from a number of alternatives. The answer of the item is fixed one. Objective type items are most efficient to measure different instructional objectives.

Objective type tests are also called as ‘new type tests’. These are designed to overcome some of the great limitations of traditional essay type tests.

Objective type tests have proved their usefulness in the following way:

a. It is more comprehensive. It covers a wide range of syllabus as it includes a large number of items.

b. It possesses objectivity of scoring. The answer in objective type test is fixed and only one and it is predetermined. So that different persons scoring the answer script arrives at the same result.

c. It is easy to score. Scoring is made with the help of scoring key or a scoring stencil. So that even a clerk can do the job.

d. It is easy to administer.

e. Objective type tests can be standardized.

f. It is time saving.

g. Objective type tests can measure wide range of instructional objectives.

h. It is highly reliable.

i. It is very much economic.

Achievement Test

Objective type tests can be classified into two broad categories according to the nature of responses required by them:

(a) Supply/Recall Type

(b) Selection/Recognition Type

(a) Supply/Recall Type:

Supply type items are those in which answers are not given in the question. The students supply their answer in the form of a word, phrase, number or symbol. These items are also called as ‘free response’ type items.

According to the method of presen­tation of the problem these items can be divided into two types viz.,

(1) Short answer type

(2) Completion type

1. Short answer type:

In which year the first battle of Panipath was fought? 1526 A.D.

2. Completion type:

The first battle of Panipath was fought in the year 1526 AD.

In the first case the pupil has to recall a response from his past experience to a direct question. These type of questions are useful in mathematics and physical science. But in the second case the pupil may be asked to supply a word or words missing from a sentence. So in completion type a series of statements are given in which certain important words or phrases have been omitted and blanks are supplied for the pupils to fill in.

Principles of Constructing Recall Type Items :

If the recall type items are constructed with the following principles then it will be more effective and it will function as intended.

1. The statement of the item should be so worded that the answer will be brief and specific:

The statement of the problem should be such that it conveys directly and specifically what answer is intended from the student.

Where did Gandhiji born?

Name the town where Gandhiji was born?

2. The statement of the item should not be taken direct­ly from the text books:

Sometimes when direct statements from text books are taken to prepare a recall type item it becomes more general and am­biguous.

3. While presenting a problem preference should be given to a direct question than an incomplete statement:

A direct question is less ambiguous and natural than an in­complete statement.

The battle of Plessey was fought in……….

In which year the battle of Plessey was fought?

4. When the answer is a numerical unit the type of answer wanted should be indicated:

When learning outcomes like knowing the proper unit, knowing the proper amount are expected at that time it must be clearly stated that in which unit the pupils will express their answer. Specially in arithmetical computations the units in which the answer is to be expressed must be indicated.

The normal body temperature of human being is——— (94.8ºF)

The normal body temperature of human being is………… Fahrenheit.

If one chocolate costs 25 paise what is the cost of 5 chocolates? (Rs. 1 Ps. 25)

If one chocolate costs 25 paise what is the cost of 5 chocolates? Rs. Paise (Rs. 1 Ps. 25)

5. The length of the blanks for answers should be equal in size and in a column to the right of the question:

If the lengths of the blanks vary according to the length of the answer then it will provide clues to the pupils to guess the answer. Therefore the blanks of equal size should be given to the right hand margin of the test paper.

Total number chromosomes in human cell is — (46)

The power house of the cell is known as (Mitochondria)

Total number of chromosomes in human cell is —-

The power house of the cell is known as

6. One completion type item should include only one blank:

Sometimes too many blanks affect the meaning of the state­ment and make it ambiguous. So that in completion type items too many blanks should not be included.

The animals those who have—(feather) arid lay— (eggs) are known as—(aves).

The animals those who have feather and lay eggs are called

Uses of recall type Items:

Several learning outcomes can be measured by the recall type items.

Some common uses of recall type items are as following:

a. It is useful to measure the knowledge of terminology.

b. It is useful to measure the knowledge of specific facts.

c. It is useful to measure the knowledge of principles.

d. It is useful to measure the knowledge of methods and procedures.

e. It is useful to measure the ability to interpret simple data.

f. It is useful to measure the ability to solve numerical problems.

Advantages of recall type Items :

a. It is easy to construct.

b. Students are familiar with recall type items in day to day class room situations.

c. Recall type items have high discriminating value.

d. In well prepared recall type items guessing factors are minimized.

Limitations of recall type Items:

a. These items are not suitable to measure complex learning outcomes.

b. Unless care is exercised in constructing the recall items, the scoring is apt to be subjective.

c. It is difficult to measure complete understanding with the simple recall and completion type items.

d. The student may know the material being tested but have difficulty in recalling the exact word needed to fill in the blank.

e. Sometimes misspelt words put the teacher in trouble to judge whether the pupil responded the item correctly or not.

f. Simple recall item tends to over-emphasize verbal facility and the memorization of facts.

(b) Selection/Recognition Type:

In the recognition type items the answer is supplied to the examinee along with some distractors. The examinee has to choose the correct answers from among them. So that these tests are known as ‘Selection type’. As the answer is fixed and given so some call it ‘Fixed response type’ items.

The recognition type test items are further classified into following types:

(i) True-False/Alternate Response Type

(ii) Matching Type

(iii) Multiple Choice Types

(iv) Classification or Rearrangement Type.

(i) True False Items:

True false items otherwise known as alternate response items consists of a declaratory statement or a situation where the pupil is asked to mark true or false, right or wrong, correct or incorrect, yes or no, agree or disagree etc. Only two possible choices are given to pupils. These items measure the ability of the pupil to identify the correct statements of facts, definition of terms, statement of principles and the like.

Principles of Constructing True False Items:

While formulating the statements of the true false items the following principles should be followed. So that the items will be free from ambiguity and unintentional clues.

1. Determiners that are likely to be associated with a true or false statement must be avoided:

Broad general statements like usually, generally, often and sometimes give a clue that the statements may be true. State­ments like always, never, all, none and only which generally appear in the false statements give clue to the students in respond­ing it.

T F = usually the prime minister of India takes his office for five years.

T F = Always the prime minister of India takes his office for five years.

2. Those statements having little learning significance should be avoided:

The statements having little significance sometimes compel the students to remember minute facts at the expense of more important knowledge and understanding.

3. The statements should be simple in structure:

While preparing statements for the true false items long, com­plex sentences should be avoided because it acts as an extraneous factor which interferes in measuring knowledge or understanding.

The sex cell spermatozoa which is a male sex cell consists of two types of chromosomes like X and Y chromosome. (T, F)

Male sex cell spermatozoa consists of X and Y chromosomes. (T, F)

4. Negative statements, especially double negative statements should not be used:

Double negative statements make the item very much am­biguous. Sometimes it is found that the students over look the negative statements.

The angles of an equilateral triangle are unequal (T, F)

The angles of an equilateral triangle are equal (T, F)

5. The item should be based on a single idea:

One item should include only one idea. We can obtain an efficient and accurate measurement of students’ achievement by testing each idea separately.

The son of Humayun Akbar who wrote ien-e- Akbari has preached a religion known as Din Ilhai. (T, F)

Akbar has preached a religion known as Din Ilhai. (T, F)

6. False statements should be more in number than true statements:

Pupils like more to accept than challenge therefore giving more false statements we can increase discriminating power of the test and reduce the guessing.

7. The length of the true statements and false statements should be equal in size:

Uses of True False Items:

1. True false items are useful to measure varied instructional objectives. Some of the common uses of true false items are given below.

2. It is used to measure the ability to identify the correctness of the statements, facts, definitions of terms etc. True false items are useful in measuring the ability to distinguish facts from opinion.

3. It is useful to measure knowledge concerning the beliefs held by an individual or the values supported by an organization or institution.

4. True false items are useful to measure the understanding of cause and effect relationship.

5. It is useful to measure the ability of the students for logical analysis.

Advantages of True-false Items:

a. True-false items provide a simple and direct means of measuring essential outcomes.

b. All the important learning outcomes can be tested equally well with true-false items like other objective type items.

c. The probability of an examinee achieving a high score on a true false test by guessing blindly is extremely low.

d. It uses few statements directly from text books.

e. It possesses very powerful discriminating power.

f. It is easy to construct.

Limitations of True-False Items:

a. As there are only two alternatives so it encourages guessing.

b. Many of the learning outcomes measured by true-false items can be measured more efficiently by other items.

c. A true false item is likely to be low in reliability when the numbers of items are less.

d. The validity of these items are questionable as the students may guess the uncertain items consistently ‘true’ or false.

e. It does not possess any diagnostic value.

(ii) Matching Items:

Matching items occur in two columns along with a direction on the basis of which the two columns are to be matched. It consists of “two parallel columns with each word, number or symbol in one column being matched to a word, sentence or phrase in the other column.” The first column for which matching is made are called as ‘Premises’ and the second column from which the selections are made are called ‘Responses’. On the basis of which the matching will be made are described in the ‘Directions’. The students may be asked to match the states with their respective capitals, historical events with dates, kings with their achieve­ments etc.

Match the dates in the column ‘B’ with the respec­tive events in column ‘A’ by writing the number of the item in ‘B’ in the space provided. Each date in column ‘B’ may be used once, more than once or not at all.

definition of written test in education

Go to the homepage

Example sentences written test

There's a written test that he forces would-be competitors to endure, after which he personally chooses the 'best' dozen.
Perhaps you should start issuing prospective dates with a written test to establish their bona fides.
Also, you need to sit a written test , so you need to be able to read and write a bit.
The licence test costs 77, but doesn't include use of a kart, and comprises a driving test on the circuit and a multiple-choice written test that covers rules and regulations.
Candidates from the university sit a written test .

Definition of 'test' test

IPA Pronunciation Guide

Definition of 'written' written

B1

COBUILD Collocations written test

Browse alphabetically written test.

  • written statement
  • written submission
  • written summary
  • written test
  • written testimony
  • written text
  • written warning
  • All ENGLISH words that begin with 'W'

Quick word challenge

Quiz Review

Score: 0 / 5

Tile

Wordle Helper

Tile

Scrabble Tools

Designing and Rating Academic Writing Tests: Implications for Test Specifications

  • First Online: 12 September 2017

Cite this chapter

Book cover

  • Amani Mejri 3  

Part of the book series: Second Language Learning and Teaching ((SLLT))

1859 Accesses

Based on local teaching and assessment considerations, this study aimed at investigating academic writing teachers’ design practices as specification writers and writing test raters. On the other hand, test takers’ conceptions of writing assessment and score interpretation were addressed. The aim was to capture a comprehensive view of writing assessment in an EFL context. To this end, a rating scale questionnaire was administered to 10 academic writing teachers in different Tunisian universities. Another rating scale questionnaire was administered to 25 third year English students in an EFL context. This study essentially dealt with theoretical and operational writing construct definition throughout test development processes for test designers, and through sitting for the test for test takers. Students’ test scores were obtained to investigate social aspects of writing assessment in the Tunisian setting. The quantitative data for this paper were analysed using SPSS and indicated there is a gap between teachers and students’ views on the writing construct and what they endorsed and represented concretely. Both teachers and students’ questionnaires along with test scores indicated that there is a remarkable focus on writing as a linguistic competence, while other (social, pragmatic, and communicative) competences are overlooked in the assessment process. A subsequent discussion of the study findings and its theoretical, pedagogical and methodological implications for the local writing assessment context was maintained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation . Cambridge: Cambridge University Press.

Google Scholar  

Bachman, L. F. (1990). Fundamental considerations of language testing . Oxford: Oxford University Press.

Bachman, L. F. (2004). Statistical analyses for language assessment . Cambridge: Cambridge University Press.

Book   Google Scholar  

Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests . Oxford: Oxford University Press.

Barkaoui, K. (2007). Teaching writing to second language learners: Insights from theory and research. TESL Reporter, 40 (1), 35–48.

Bereiter, C., & Scardamalia, M. (1987). The psychology of written composition . Hillsdale, NJ: Erlbaum.

Brown, J. D. (1991). Do English and ESL faculties rate writing sample differently? TESOL Quarterly, 25 (4), 587–603. https://doi.org/10.2307/3587078 .

Article   Google Scholar  

Brown, J. D. (1996). Testing in language programs . Upper Saddle River, NJ: Prentice Hall.

Cohen, L., Manion, L., & Morrison, K. (2007). Research methods in education . London: Routledge.

Cumming, A. (2002). Assessing L2 writing: Alternative constructs and ethical dilemmas. Assessing Writing, 8 (2), 73–83. https://doi.org/10.1016/S1075-2935(02)00047-8 .

Cumming, A., Kantor, R., & Powers, E. D. (2001). Scoring TOEFL essays and TOEFL prototype writing tasks: An investigation into raters’ decision making and developing of preliminary analytic framework [Monograph] . TOEFL Monograph Series. Princeton, NJ: Educational Testing Service.

Flower, L., & Hayes, J. (1981). A cognitive process theory of writing. College Composition and Communication, 31 , 21–32. https://doi.org/10.2307/356600 .

Hidri, S. (2014). Developing and evaluating a dynamic assessment of listening comprehension in an EFL context. Language Testing in Asia, 4 (4), 1–19. https://doi.org/10.1186/2229-0443-4-4 .

Hidri, S. (2015). Conceptions of assessment: Investigating what assessment means to secondary and university teachers. Arab Journal of Applied Linguistics, 1 (1), 19–43.

Hyland, K. (2003). Second language writing . Cambridge: Cambridge University Press.

Kroll, B., & Reid, J. (1994). Guidelines for designing writing prompts: Clarifications, caveats and cautions. Journal of Second Language Writing, 3 (3), 231–255. https://doi.org/10.1016/1060-3743(94)90018-3 .

Lumely, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to raters? Language Testing, 19 , 246–276. https://doi.org/10.1191/0265532202lt230 .

Lynch, B. K., & Davidson, F. (1994). Criterion-referenced language test development: Linking curricula, teachers and tests. TESOL Quarterly, 28 (4), 727–743. https://doi.org/10.2307/3587557 .

Malone, M. E. (2013). The essentials of assessment literacy: Contrasts between testers and users. Language Testing, 30 (3), 329–344. https://doi.org/10.1177/0265532213480129 .

Messick, S. (1989). Validity . New York: American Council on Education Macmillan.

Moss, P. A. (2003). Reconceptualizing validity for classroom assessment. Educational Measurement: Issues and Practice, 22 (4), 13–25. https://doi.org/10.1111/j.1745-3992.2003.tb00140.x .

Popham, W. J. (1978). Criterion-referenced measurement . Englewood Cliffs, NJ: Prentice Hall.

Reid, J., & Kroll, B. (1995). Designing and assessing effective classroom writing assignments for NES and ESL students. Journal of Second Language Writing, 4 (1), 17–41. https://doi.org/10.1016/1060-3743(95)90021-7 .

Ruth, L., & Murphy, S. (1984). Designing topics for writing assessment: Problems of meaning. College Composition and Communication, 35 (4), 410–422.

Swales, J. (1990). Genre analysis: English in academic and research settings . New York: Cambridge University Press.

Swedler-Brown, C. (1993). ESL essay evaluation: The influence of sentence-level and rhetorical features. Journal of Second Language Writing, 2 (1), 3–17. https://doi.org/10.1016/1060-3743(93)90003 .

Weigle, S. C. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing, 6 (2), 145–178. https://doi.org/10.1016/S1075-2935(00)00010-6 .

Weigle, S. C. (2002). Assessing writing . Cambridge: Cambridge University Press.

Weir, C. J. (2005). Language testing and validation . Hampshire: Palgrave Macmillan.

Download references

Author information

Authors and affiliations.

Faculty of Human and Social Sciences of Tunis, Tunis, Tunisia

Amani Mejri

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Amani Mejri .

Editor information

Editors and affiliations.

Department of English, Faculty of Human and Social Sciences of Tunis, Tunis, Tunisia

Sahbi Hidri

Appendix 1 Tunisian Academic Writing Teachers’ Views on Defining Writing as a Test Construct

  • a Multiple modes exist. The smallest value is shown

Appendix 2 Descriptive Statistics: Tunisian Academic Writing Teachers’ Background as Raters and Scores Interpreters

Appendix 3 descriptive statistics: students’ views regarding their academic writing test tasks, rights and permissions.

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Mejri, A. (2018). Designing and Rating Academic Writing Tests: Implications for Test Specifications. In: Hidri, S. (eds) Revisiting the Assessment of Second Language Abilities: From Theory to Practice. Second Language Learning and Teaching. Springer, Cham. https://doi.org/10.1007/978-3-319-62884-4_7

Download citation

DOI : https://doi.org/10.1007/978-3-319-62884-4_7

Published : 12 September 2017

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-62883-7

Online ISBN : 978-3-319-62884-4

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

What is Essay Type Test

Back to: Measurement and Evaluation in Education B.ed Notes, M.A Notes, IGNOU Notes and Graduation Notes

The word essay has been derived from a French word ‘essayer’ which means ‘to try or to attempt’.

Definition of Essay Type Test

“Essay test is a test that requires the student to structure a rather long written response up to several paragraphs.” i.e. “the essay test refers to any written test that requires the examinee to write a sentence, a paragraph or longer passages.”

Characteristics of essay type test

  • The length of the needed responses vary with regard to marks and time. For example, in bed papers, there are 10 mark, mark, and 3 mark questions, thus the length of the answers changes appropriately. For 10 marks, it must be finished within 15-20 minutes for each 3 marks; 5 minutes is the maximum, therefore the length of replies varies with regard to time.
  • It necessitates a subjective judgement: judgement refers to making a decision or judging, but subjective refers to not being fair enough, i.e., it varies from person to person, for example, criteria for drafting a statement of specification. We are asked to provide each requirement along with examples. Some may write simply criteria, while others may provide examples; therefore, marks or grades are assigned based on the degree of quality, accuracy, and completeness of the responses.
  • Most common and commonly used: The essay has become an important aspect of formal education. Structured essay format is taught to secondary students in order to improve their writing skills. Many of the same types of essays are used in magazine or newspaper articles as in academic writings. Employment essays outlining our experience in specific occupational domains are also required while applying for some positions, particularly government ones. As a result, it is the most well-known and commonly utilised.

Essay is of two types:

Restricted answer questions:.

These questions typically limit both the substance and the response. The breadth of the issue to be discussed usually limits the substance, and constraints on the method of answer are frequently mentioned in the question. Another technique to limit replies in essay assessments is to ask questions about specific topics. To that end, introductory information similar to that utilised in interpretative exercises might be offered. The sole difference between these items and objective interpretive exercises is that essay questions are utilised instead of multiple choice or true or false answers. Because the restricted answer question is more organised, it is best suited for assessing learning outcomes that need the interpretation and application of data in a specific area.

For eg: State any five definitions of Socioilogy?

Write a life sketch of Hitler in 200 words?

Extended response questions:

Students are not restricted in terms of the topics they will address or the style of structure they will utilise. Teachers should provide students as much leeway as possible in determining the nature and breadth of their inquiries, and he should respond to these sorts of questions in a timely and relevant manner. The student may choose the points he believes are most significant, pertinent, and relevant to his views and order and organise the answers in whatever way he sees fit. As a result, they are also known as free response questions.

The instructor can then assess the student’s ability to organise, integrate, understand, and express themselves in their own words. It also allows you to remark on or investigate students’ development, the quality of their thinking, the depth of their learning, problem-solving abilities, and any issues they may be experiencing. These abilities interact with each other as well as the information and comprehension required by the situation. Thus, this form of inquiry contributes the most at the levels of synthesis and evaluation of writing skills.

  • E.g.: 1. Describe at length the defects of the present day examination system in the state of Maharashtra. Suggest ways and means of improving the examination system.
  • 2. Describe the character of hamlet.
  • 3. Global warming is the next step to disaster.
  • Extended Response of free type response type: in this form of inquiry, the replies demand that the student is not limited to the amount to which he has discussed the concerns raised or the question asked.
  • Plan and organise his ideas in order to provide a response.
  • Put his views through by expressing oneself freely, exactly, and clearly utilising his own words and writings.
  • Discuss the questions in depth, providing various facets of his understanding on the matter or issue mentioned.

IMAGES

  1. Types Of Written Tests

    definition of written test in education

  2. The written test: Get a better score

    definition of written test in education

  3. Types Of Written Tests

    definition of written test in education

  4. Types of Test in Education

    definition of written test in education

  5. How Standardized Tests Are Created Infographic

    definition of written test in education

  6. ⚡ Classification of test in education. 26 CFR § 1.410(b). 2022-11-02

    definition of written test in education

VIDEO

  1. Types of Tests

  2. Characteristics of test and examination ( B.Ed, M.Ed, D.L.Ed)

  3. A brief history of writing assessment

  4. The Ultimate Stress Test 😱

  5. Types of Test || Test, Measurement, Assessment, Evaluation

  6. Proficiency Test Definition, example,types, characteristics, advantages, disadvantages| Urdu & Hindi

COMMENTS

  1. Written Examinations

    P. Buchwald, C. Schwarzer, in International Encyclopedia of Education (Third Edition), 2010 (Standardized) Written Tests. A test or a written examination is a technique to assess students' knowledge, skills, or abilities. Tests are usually divided into several parts, each covering a different area of the field to be tested.

  2. PDF Standardized Assessment and Testing in PreK-12 Education

    Code of Fair Testing Practices in Education, developed by the Joint Committee on Testing Practices (JCTP). • Clearly Defined Question and Scope - The first necessary step for a strong assessment process is a clear set of definitions - what questions need to be answered, for what groups, and to what purpose? ii, iii, iv. This core information

  3. Written Assessment: Evidence, Theory, and Practice

    Written assessment is the most commonly used form of testing in tertiary and professional education. One of the most useful ways of finding out what people know is to ask them a question. When ...

  4. Exam Questions: Types, Characteristics, and Suggestions

    Examinations are a very common assessment and evaluation tool in universities and there are many types of examination questions. This tips sheet contains a brief description of seven types of examination questions, as well as tips for using each of them: 1) multiple choice, 2) true/false, 3) matching, 4) short answer, 5) essay, 6) oral, and 7 ...

  5. PDF Test, measurement, and evaluation: Understanding and use of the ...

    Test, measurement, and evaluation are concepts used in education to explain how the progress of learning and the final learning outcomes of students are assessed. However, the terms are often misused in the field of education, especially in Ghana. The objective of the study was to thoroughly explain

  6. The past, present and future of educational assessment: A

    To see the horizon of educational assessment, a history of how assessment has been used and analysed from the earliest records, through the 20th century, and into contemporary times is deployed. Since paper-and-pencil assessments validity and integrity of candidate achievement has mattered. Assessments have relied on expert judgment. With the massification of education, formal group ...

  7. Test Construction

    Test Construction. Most tests are a form of summative assessment; that is, they measure students' performance on a given task. (For more information on summative assessment, see the CITL resource on formative and summative assessment.)McKeachie (2010) only half-jokes that "Unfortunately, it appears to be generally true that the examinations that are the easiest to construct are the most ...

  8. What Are Education Tests For, Anyway? : NPR

    These rules are the reason that 4 million U.S. students are taking extra tests this year. Not for their own practice, but to test the tests themselves. These are the newly developed tests ...

  9. What Is Assessment? Purposes of Assessment and Evaluation

    Particular attention is drawn to the use of alternative assessment as practical applications in the classroom. These highly effective techniques can be incorporated into the daily activities by informed ESL practitioners. Some useful techniques are: Nonverbal strategies, presentations, oral and written products, and portfolios.

  10. Assessing Written Communication in Higher Education: Review and

    More recently, the Educational Testing Service (ETS, 2013a) conducted interviews with provosts or vice presidents of academic affairs from more than 200 institutions regarding the most commonly measured general education skills, finding that written communication was the most frequently mentioned competency considered by respondents as critical ...

  11. Written Assessment

    Written assessment is the most commonly used form of testing in tertiary and professional education. One of the most useful ways of finding out what people know is to ask them a question. When such enquiries are written down or the person is required to give a written response, it is the territory of written assessment.

  12. Formative, Summative & More Types of Assessments in Education

    St. Paul American School. There are three broad types of assessments: diagnostic, formative, and summative. These take place throughout the learning process, helping students and teachers gauge learning. Within those three broad categories, you'll find other types of assessment, such as ipsative, norm-referenced, and criterion-referenced.

  13. Standardized Test Definition

    Standardized Test. A standardized test is any form of test that (1) requires all test takers to answer the same questions, or a selection of questions from common bank of questions, in the same way, and that (2) is scored in a "standard" or consistent manner, which makes it possible to compare the relative performance of individual students ...

  14. Test, measurement, and evaluation: Understanding and use of the

    Test, measurement, and evaluation are concepts used in education to explain. how the progress of learning and the final learning outcomes of students are. assessed. However, the terms are often ...

  15. PDF Valid and Reliable Assessments

    One question that is often asked when talking about assessments is, "Is the test valid?" The definition of validity can be summarized as how well a test measures what it is supposed to measure. Valid assessments produce data that can be used to inform education decisions at multiple levels, from school improvement and effectiveness to teacher

  16. Summative Assessment Definition

    Generally speaking, summative assessments are defined by three major criteria: The tests, assignments, or projects are used to determine whether students have learned what they were expected to learn. In other words, what makes an assessment "summative" is not the design of the test, assignment, or self-evaluation, per se, but the way it is ...

  17. What Does the Research Say About Testing?

    Giving more time for fewer, more complex or richer testing questions can also increase performance, in part because it reduces anxiety. Research shows that simply introducing a time limit on a test can cause students to experience stress, so instead of emphasizing speed, teachers should encourage students to think deeply about the problems they ...

  18. PDF Chapter 7 Written Tests: Constructed-Response and Selected-Response Formats

    patients in a real setting. A written test would be mismatched to the purpose of this test and the required validity evidence, given the intended purpose of the test. Both the CR and the SR have some unique strengths and limitations, as noted in Table 7.1. Both testing formats have been researched and written about for nearly a century.

  19. Summative Assessments

    Summative assessments are usually graded, are weighted more heavily than other course assignments or comprise a substantial percentage of a students' overall grade (and are often considered "high stakes" assessments relative to other, "lower stakes" assessments in a course), and are required assessments for the completion of a course.

  20. Types of Written Tests

    ADVERTISEMENTS: This article throws light upon the two types of written tests which are carried out to determine the performance of students. The types are: 1. Objective Type Tests 2. Essay Type Tests. Type # 1. Objective Type Tests: Objective type test items are highly structured test items.

  21. WRITTEN TEST definition and meaning

    WRITTEN TEST definition | Meaning, pronunciation, translations and examples

  22. Designing and Rating Academic Writing Tests: Implications for Test

    Abstract. Based on local teaching and assessment considerations, this study aimed at investigating academic writing teachers' design practices as specification writers and writing test raters. On the other hand, test takers' conceptions of writing assessment and score interpretation were addressed. The aim was to capture a comprehensive ...

  23. Educational testing

    Education, Tests and Measures in. Daniel R. Eignor, in Encyclopedia of Social Measurement, 2005 Sources of Information on Existing Educational Tests. For most educational testing contexts, likely one or several commercially available tests can be considered. One exception is in classroom testing for the management of instruction; a teacher-made test is often preferable in this case, if only to ...

  24. What Is Essay Type Test

    What is Essay Type Test. The word essay has been derived from a French word 'essayer' which means 'to try or to attempt'.. Definition of Essay Type Test "Essay test is a test that requires the student to structure a rather long written response up to several paragraphs." i.e. "the essay test refers to any written test that requires the examinee to write a sentence, a paragraph or ...

  25. Texas will use computers to grade written answers on this year's STAAR

    The STAAR test results are a key part of the accountability system TEA uses to grade school districts and individual campuses on an A-F scale. Students take the test every year from third grade ...

  26. PDF Information sharing

    Keeping Children Safe in Education. 8. Other relevant guidance . documents, such as the . Eight Caldicott principles. 9, and useful materials can be found in Annex B. Definitions (as defined in Working Together to Safeguard Children 2023) A . child. is defined as anyone who has not yet reached their 18th birthday. 'Children' therefore means '