Generative AI in education and the implications for assessment

Docent geeft uitleg aan studenten met gebruik van laptop

What is the impact of generative AI (genAI) on education and what does it mean for teachers and courses? This article gives insight in these questions. In addition, you can find an overview of the possible adjustments that can be made within a program or course, with examples from writing education. 

The text below was written without using genAI, and translated from Dutch to English using Deepl (the free version)

It has now been over a year and a half since openAI launched chatGPT, and developments are moving fast. It is becoming increasingly clear that this new generative AI can and will be used in a variety of ways. Regarding the field of assessment, we see that the impact of genAI is taking place at two levels, namely at the teacher level and the course level. Both levels have overlapping but also unique interests. That makes this a complex issue. 

Teacher level

At the teacher level, the questions surrounding assessment are numerous: Can I still use my writing assignments? Should I start assessing everything in a controlled setting? What can I do if I suspect fraud? Etc. In doing so, teachers have certain interests: after all, it is not pleasant to work from distrust, and transparency and open communication are very important.

Program level

Questions also arise at the program level. Examination committees want tools to help them make decisions, and program directors want to know how to support their teaching teams. In addition, there are other interests at play here. From an assessment perspective, it is about ensuring the quality of the program. In other words, will a student master the learning outcomes upon completion of the program? This is a crucial question because it is required before a diploma may be issued. 

GenerativeAI: is this really so different from a technical calculator?

"We also said this when the Internet emerged, and when the technical calculator was introduced." Indeed, the fear that young people and students would no longer be able to look up information independently, and would no longer learn math, also played out at the time. We have since seen that both are still learnable and necessary, but at the same time that our attainment standards have moved with the new reality that this brings. A similar shift will come about with genAI, once the field embraces this technology. 

The contrast to educational resources that were available before genAI is that genAI allows users to generate content rather than control or edit it. That students know how to find these opportunities and are already using genAI extensively is also clear. This also increases the need to think carefully about what all this will be able to do for our education system. The multitude of possibilities and applications of genAI leads to a large number of tools on the market that involve AI "under the hood”. Regarding genAI text-to-text, there are several providers at the moment. In addition, genAI can generate codes, create images, and now audio and music, and the number of genAI tools available for education is growing exponentially.

The impact of genAI on macro, meso and micro levels of education

As mentioned above, genAI affects many facets of education. In this regard, King's College London distinguishes the macro, meso and micro levels.

Macro level

By the macro level, they mean the university-wide principles and policies surrounding genAI. Within the UU this is also being considered. UU's current guideline can be found here. An update of the guidelines is currently being worked on, so keep an eye on the intranet to stay up to date on this. 

Meso level

When we look at the meso-level, that means it's about the implications of genAI at the department and program level.

Micro level

Finally, there is the micro level, which is about the implications of genAI within courses and on the work of individual faculty members. This level is about using genAI for student learning, or to perform teacher tasks more efficiently. There are important questions at this level. For example, how do we ensure that students can retain their learning moments? And if genAI is deployed by a teacher, there are also legal (e.g., about privacy) and ethical questions. Is it a good idea to use genAI to develop teaching activities? Or to provide feedback or grades? Etc.

When it comes to assessment, there are questions at both the meso and micro levels. These are interrelated, see Figure 1. As an examiner, you are responsible for determining whether the intended learning outcomes have been achieved at the course level (the green part in the illustration). The program director is responsible for determining whether the learning outcomes have been met at the program level (the blue part).

Figure 1a: A simplified example of an assessment plan within a program.

If there are vulnerabilities in assessment at the course level, quality assurance of program-level learning outcomes is in question. If an instructor allows students to use genAI in the course, this also makes it a program-level educational design issue (i.e., meso-level).

Figure 1b: Example of the consequences of assessments that are vulnerable for genAI (in red)

The meso-level: what needs to be done at the training level?

For a program, the first step is to identify which end terms are vulnerable if genAI is used by students during assessment tasks. 

How vulnerable is your program to genAI?

In other words, which end terms can no longer be reliably and validly assessed for student mastery? How vulnerable is your program to genAI? To get an idea of the vulnerability of your assessment, you can follow the roadmap as set up by the Teaching and Learning Center at Southern Cross University.

If it appears that an end term is no longer covered by sufficiently valid and reliable assessments, then you need to get to work. In most cases, there are multiple assessments within multiple courses in multiple years of a program that help determine whether an end term has been met. Figure 1b illustrates this in simplified form. As you can see, end term 1 is not adequately covered because this end term is only assessed in course 1, by an assessment vulnerable to genAI use. There are multiple assessments vulnerable to genAI, but because the other end terms are assessed at multiple times, their coverage is not in question.

The consequence, therefore, is that a change must be made for assessment 1 of course 1. For an overview of the options and trade-offs, see the section "Assessments without genAI: what options are there?".

The micro-level design issue: effects of and impacts with genAI within a course

As mentioned, the question of whether or not to allow genAI in your course is primarily a program-level educational design issue. However, when designing, you iterate from program learning outcomes to course learning outcomes and back to make everything fit together. In a course-level design issue, we look at the learning objectives, the teaching activites, and, of course, the assessments. If these three are connected in a good way, and genAI is incorporated into that in a consistent way, you will have a course in which students can learn optimally (in other words, you are working on constructive alignment, Biggs & Tang, 2011). 

Effects of and impacts with use of tools

About the use of ICT tools in education, Salomon distinguished back in 1992 between the effect of tools on achievement with use of a tool and the effect on achievement of using a tool. A tool in this context was for example the use of a calculator, a spelling check, reference software or the deployment of statistical software such as SPSS.

A learning effect with use of a tool is about using the tool to perform better, for example, checking your spelling using the spelling checker in Word. The consequence is that the tool (e.g. genAI) may then also be used during assessment, see Figure 2a.

Figure 2a: Constructive alignment in case of an intended learning outcome with the effect of genAI

In contrast to that, you have the learning effects of using a tool. To say anything about this, you have to be able to separate the learning effects of using a tool from the learning effects without the tool. In other words, you want the student to master a skill without genAI. To be sure of this, it is important that the assessment is administered without genAI, see Figure 2, right side. This distinction means that, as an educational designer, you have to start by looking at your learning outcomes before you can make decisions about the assessments. By the way, as you can see, genAI can be used in both scenarios as teaching activity, to support student learning. 

Figure 2b: Constructive alignment in case of an intended learning outcome of the effect of genAI

Learning effect of genAI, so assessment without genAI: what possibilities are there? 

This section is interesting if it turns out that you have a learning objective that you consider so important that you want students to eventually master it without using genAI. So this means that during assessment you must also ensure that genAI cannot be used. A logical step then is to take the assessment in a controlled environment, where students do not have access to genAI (or the internet). However, for writing assignments, this is not a good solution because you then create a validation problem. As an example, an essay in which a student can think about the material for an extended period of time, returning to his/her story each time, assesses something fundamentally different from an essay under time pressure in an on-campus setting. Therefore, changing from a home-written essay to an essay in an exam room is not a solution to properly assess your learning outcomes. So what options do you have? Again, that depends on the type of learning outcome you want to assess; are these learning outcomes around writing skills or learning outcomes around higher cognitive skills?

Learning outcomes around writing skills: set up the assessment process well

If your learning outcome revolves around writing itself, you might think of an assessment in a controlled environment where the student cannot access the internet (or certain websites). As mentioned, this raises questions about the validity of the assessment. A better option is to design the writing process differently. Currently, it is often the case that students have one feedback moment when they submit a text, and then submit the final text. In doing so, the teacher has little insight into the writing process, and how the student is progressing. A different format for the assessment process allows the teacher to keep more of an eye on the writing process. This can be done, for example, by seeing and discussing different types of a student's products. One might think of an argumentative outline, or a extensive reference list in which the student describes what is taken from the articles. In addition, the teacher can also verbally discuss the student's written piece with the student. If the assessment process is set up in this way, the instructor can ultimately make an informed summative decision (Scheider et al., 2023).

The advantage of a clear assessment plan in which the moments at which genAI may or may not be used are clearly considered, is that you can also discuss this with your students. This could be a discussion about the goals of an assessment and the corresponding learning moments you have in mind for the students (box 2). Of course, the above is more time-intensive than the way the essay assessment process is currently set up in many courses. This time-intensive approach is not possible in all courses, so it is important to make these trade-offs at the meso-level. It is likely that this adjustment of assessment and the use of multiple instructors will only be necessary in a few courses within a course.

Learning outcomes where a writing assignment is used to assess other skills

Often, written products are used to assess disciplinary skills and knowledge. The goal here is not always explicitly to teach students to write, but to test whether a student can think critically, analyze, or evaluate.

In that case, in addition to intensifying process guidance as mentioned above, there are some more options. In the short term, you can try to make your assessment less "AI vulnerable”. These tend to be minor adjustments to the question or case, making genAI less able to handle it. For tips and tricks, see the various resources below. A note here is that genAI is a self-learning system.

Developments are rapid, so don't see this as a long-term solution. Some tips that are frequently mentioned (see also box 3):

  • Specificity (ask students to think about specific cases, scenarios or incidents)
  • Use examples from the southern hemisphere, this data was less present in the training data on which the model bases its answers
  • Use different ways of having information presented whenever possible
  • Avoid questions that test lower order thinking skills
  • Use multi-step problems

In addition, it is wise to scrutinize your forms of assessment. Is it necessary to assess this learning objective with a written paper? If not, another form would be possible. The advantage of this is that other assessment forms (at this time) do not generate well with AI. An argument diagram or a presentation with oral explanation are examples. Authentic assessment (which is always more specific and often has several sequential steps) or oral assessment (or oral explanation of the assessment), for example, are also forms you could consider.

Leereffect met genAI, dus GenAI gebruik tijdens de toets toestaan: is dat mogelijk?

Learning effect with genAI, so allowing GenAI use during the assessment: is it possible?
Suppose you have decided that students are allowed to use genAI during the assessment for a particular learning objective. You may have several reasons for this. Maybe you have seen in the field that genAI is already being used there, or maybe writing a text will never be the core business of your students. The consequence of a learning effect with genAI is that students may also use genAI during the assessment. 

If you consider "scientific writing" as one learning objective, most teachers will indicate that this is something students should be able to do independently. However, a complex skill like scientific writing consists of many subskills, such as spelling, writing style, argumentation, structuring a text, and critical thinking and evaluation (Elander et al., 2006). Not all of these subskills are equally important to all students at all times. This argues in favor of allowing genAI to be used during some writing assessments.

In addition, tools are already being used to increase writing performance. For example, it is allowed everywhere to use Word's spelling checker. So the learning outcome ' being able to spell well' is already supported within higher education. The question is: for what other learning outcomes within writing is this also going to be the case? The answer to this question will vary by discipline and by time in the course. There are a number of considerations against using tools to support performance and skill, see Box 4. Nevertheless, it is very likely that every course will allow genAI to a greater or lesser extent on a written test at some point in time. The question then becomes: how do you do this validly and reliably? 

If you allow genAI use (to any degree) during your assessment, you run into the problem that the quality of the writing will be highly correlated with how AI literate a student is. The better the promps, the better the output of genAI. In other words, are you measuring AI literacy or writing skills with your assessment? So there is a validity problem. This is not new, by the way; the same type of issue arises in the assessment of written group work (as described in Meijer et al., 2020).

What skills do you measure with your test?

In addition to this validity problem, allowing genAI during an assessment is a difficult issue because there are so many different types of genAI use. Besides the question of what type of use is allowed and what is not (see for example the AI assessment scale, and the UU guidelines), it appears that different types of use require different skills from students (personal communication with Susha, Viberg & Koren, 2024). So it is hard to say exactly which skills you measure with your assessment.

Empirical research is currently being conducted and published in many places, so we expect to be able to say more about this in the near future.

Call: What is your experience with adapting your assessments?

The UU is collecting examples from different contexts in 2024 and 2025. So if you also have experiences around adapting your assessment, please contact the authors. We would love to hear about your experiences! 

Some lecturers have started working on their assessments, allowing students to use genAI for (part of) the writing of an essay or thesis. A framework for designing and grading an assessment can be found in box 5. Two examples of designing your assessment in a course can be found in box 6.

Summary of the recommendations

Our recommendation around assessment in these times of genAI, is to start looking at the end terms and learning outcomes at the meso level (program level). That way, you can identify which end terms are no longer being assessed reliably now that genAI can be used by students. To get an idea of the vulnerability of an assessment, you can follow the roadmap set up by the Teaching and Learning Center at Southern Cross University. Next, you can think about which assessments you need to adapt. Adaptations can be done in two ways:

  1. you allow genAI to be used (to a greater or lesser extent) during the assessment, or
  2. you try to rule out genAI being used during the assessment. Which you do when depends on your learning outcomes.

If you allow genAI during the test, it is important to think carefully about what you are assessing: is this AI literacy or actual writing skills? And how do you distinguish this from subject-related knowledge? In other words, pay close attention to the validity of your assessment.

If you want to exclude genAI use as much as possible during the assessment, an adaptation to the assessment process is the best option. This adaptation is also the most robust against possible new forms of genAI that will be developed.

Adaptations to the assessment process should ensure that the teacher maintains visibility into the writing process, for example by seeing and discussing different types of a student's products. In addition, the instructor can also verbally discuss a student's written piece with the student.

If the assessment process is set up in this way, the teacher can ultimately make an informed summative decision, where a good estimate can be made about the student's level without using genAI. 

Support

We realize that in some cases these will be large-scale adjustments. Would you like clarification on any of these steps? Or would you like support in mapping or adjusting your assessment? Please contact Educational Development & Training.

Published: May 2024