Published on: 27/03/2024 · Last updated on: 02/09/2024
Professor James Davenport – Professor of Information Technology
As the University rolls out its new generative artificial intelligence (GenAI) assessment categorisations, academics throughout the university are working at pace to modify and create new assessment types. Read on to learn about the Type A and Type C assessments created by Professor James Davenport from the Computer Science Department.
Type A assessment
Under the new assessment categorisations, a Type A assessment is where students are not permitted to use GenAI. Including Type A assessments on a course – and/or Type B assessments where the use of AI is permitted but restricted to certain tasks – ensures that students’ own thought processes and skills are developed and assessed and that students do not become overly reliant on the use of GenAI. To design such an assessment, Professor Davenport advises reflecting on how a large language model (LLM) processes information compared to a human. The LLM predicts the next most likely word in a sentence rather than looking for connections and meaning across the whole piece of work as a human might. Coupled with this, LLMs have limited context windows, meaning that the prompt data they can process at any one time is restricted. LLMs therefore struggle to produce satisfactory answers when an assignment has inter-related but separate elements. These shortcomings provide an opportunity to create assignments that, for now at least, cannot be satisfactorily completed without significant input from the students. Below are two examples of assignments Professor Davenport sets for his MSc Distance-learning Computer Science students.
Example 1 – card transaction log and commentary
Students are asked to undertake a card transaction, e.g. buying something from an online shop, and track who receives a log of the card details. They must then write a commentary. Professor Davenport has found, GenAI can write a coherent and plausible sounding commentary, but on closer inspection it does not match the information in the log.
Example 2 – answers with an annotated bibliography
Based on information given to them in two case studies, students must answer a series of questions on cyber security. The target length for the answers is between 200 and 500 words and they must be cross-referenced with the annotated bibliography that is the second part of the assignment. This task is context heavy, a challenge for GenAI with its limited context window. It struggles to relate the questions to the lengthy case studies and then to the bibliography. Its tendency with the latter is to supplement the list with titles that correspond to the topic of the assignment, but which are fabricated. This allows the marker to pick up cases of academic dishonesty.
Reflections
Professor Davenport warns that marking these two-part assignments can be challenging. It is best undertaken with two screens to check the correspondence of one element with the other. It also requires attention to detail, and verifying references can be laborious. However, on a course taught entirely at a distance and asynchronously, where there is no personal relationship with the student or sense of their abilities or communication style, he finds it important to have a mechanism for enforcing his restrictions on GenAI use. To do so defends the integrity of the course and helps ensure students learn a full range of skills from the course.
Type C assessment
Professor Davenport also uses Type C assignments in his teaching so that students learn the strengths and limitations of GenAI. For an assignment with his Degree Apprenticeship Computer Science students, he decided to retain the model described above of stipulating two corresponding elements. The first element involves giving a prompt to AI, such as ChatGPT, to analyse some data. The second element is to reflect on the AI’s output. Both cohorts who have so far completed this assessment noticed minor errors made by the AI, but failed to notice major errors, such as information attributed to the wrong column of a table. This may indicate that one crucial lesson from Type C assignments is to highlight the risks of automation bias: this is the assumption that automated systems can be trusted as infallible, or almost so. For example, those who fly planes often require extended training to resist the temptation to rely on autopilot. Future iterations of the assignment brief will help warn students of this bias in advance.
Recommendations
When considering how to develop GenAI-adapted assignments, consider:
- Having more than one element to the assessment and have them inter-relate. Complex tasks that require a deep understanding of context, critical thinking, and the application of knowledge in a nuanced way are less likely to be successfully completed by AI alone.
- Producing tasks that have context which is either unavailable to GenAI (such as that gathered through in-person interactions) or too large for it to process. There is a limit to how much data GenAI can process at any one time. Google’s Gemini 1.5 processes the most with around 1 million tokens (words/parts of words). This is still experimental but will later be made available at a cost. Most other models can only handle 200,000 tokens maximum, though this is subject to change.
- Using competency-based or authentic assessments that address or simulate real-world problems.
- Asking students to write reflective pieces where they outline their thought processes around learning tasks, e.g. describing how they selected the sources contained in their bibliography.
- Providing a greater range of tasks for students to complete. If students can choose something that interests them they are more likely to be intrinsically motivated and will therefore be less tempted to misuse GenAI.
- Incorportate GenAI into your teaching as an integral element. For more ideas on this you may be interested in the work of Dr. Ioannis Georgilas.