No items found.
No items found.
No items found.
*This resource has been tested for appropriateness in the classroom and scrutinised for safeguarding and cybersecurity issues. However, please do carry out any due diligence processes required by your own institution before using or recommending it to others.
*This is an example of a resource that is under development and may not have been fully vetted for security, safety or ethics.  Please carry out your own safeguarding and cybersecurity due diligence before using it with students or recommending it to others.

Assessing AI-Generated Responses: ChatGPT in Physics Open-Ended Questions

Sixth Form
No items found.
Teaching & Inclusive Practices
Key Stage 4
Key Stage 3
Key Stage 5
Case Study
No items found.
No items found.
Jess Power

Teacher of Physics & Digital Education Co-ordinator

The investigation evaluated ChatGPT's capacity to produce responses for physics open-ended questions. Different prompts led to varied response qualities, notably improving when accompanied by marking guidelines. However, the generated responses commonly lacked crucial elements such as mathematical relationships and diagrams. Collaborative discussions between teachers and pupils around these AI-generated responses proved pivotal by empowering pupils to recognise the essential qualities needed for exemplary responses, emphasising the educational value derived from analysing AI-generated content.

The aim of the investigation was to explore the feasibility of AI technology, specifically ChatGPT, to generate exemplar responses for open-ended questions (OEQs) in physics. Primary objectives included:

Evaluate whether ChatGPT can consistently produce responses that align with the exam board criteria for excellent OEQ responses in physics.

Utilise the generated responses to lead pupil discussions on the key attributes of both exceptional and subpar OEQ responses, with the goal of enhancing understanding of OEQ structure and content.

ChatGPT, an AI model developed by OpenAI, specialises in interpreting and generating human-like natural language text responses. Trained on an extensive body of internet text, its objectives revolve around enhancing natural language understanding and processing while reflecting improved knowledge depth in its responses.

In Scottish Qualifications Authority physics exams, OEQs challenge pupils to apply physics knowledge to real-world scenarios.

OEQs are often demanding for both pupils and teachers. Pupils routinely struggle with initiating responses, judging the necessary depth of explanation, and keeping their commentary relevant to the question. OEQs lack exemplary answers in the marking instructions, which provide only general grading criteria (e.g., 3 marks for strong understanding, 2 for reasonable, and 1 for limited understanding). This can make it difficult for teachers, especially NQTs, to assess responses and apply consistent marking. Physics teachers will often spend considerable time crafting model responses to compensate for this absence of exemplar answers to share with their pupils.Six different OEQs from the National 5 curriculum, representing a range of topics was selected. Each was fed into ChatGPT multiple times, using the following prompts:

Original question.

Question + Learning Outcomes.

Question + Target Audience (i.e. this responses should be written with the knowledge of a 15-year-old physics student.)

Question + SQA Grading Criteria.

The responses were anonymously assessed by physics teachers at Robert Gordon’s College. Responses generated when ChatGPT was given the question only typically scored poorly (mean 1/3) due to a lack of N5-level concepts and irrelevant or poorly explained physics content.

However, when prompts specified learning outcomes or target audiences, the responses slightly improved (mean of 1.9 and 1.7 respectively), showing more references to course concepts but often containing repetitive information. The most successful responses, averaging a score of 2.6, occurred when marking instructions accompanied the question.

Nevertheless, all responses fell short by omitting mathematical relationships, formulas, and annotated diagrams—a crucial element in pupil explanations.The teachers then shared these AI-generated examples with their classes. Despite the responses being of average quality, they served as effective stimuli for meaningful learning discussions, enabling pupils to identify key characteristics of strong answers, such as including essential physics relationships, definitions of core concepts and, when appropriate, annotated diagrams.

This method streamlined educators' work by efficiently generating multiple examples for analysis, saving time for more focused teaching. While AI, especially ChatGPT, has promise in aiding education, it emphasised the importance of precise prompts and the continuous refinement of AI systems to cater to specific educational requirements.

The scenario in this case study is genuine and based upon real events and data, however its narration has been crafted by AI to uphold a standardised and clear format for readers.

Key Learning

ChatGPT proves adept at swiftly and effectively crafting open-ended question (OEQ) responses, particularly when guided by a more specific initial prompt. Although the generated responses varied in quality, they still served as valuable stimuli for pupils to delve into, discussing these with teachers and peers to enhance their grasp of what defines an exceptional response.


Class teachers should thoroughly review the generated responses before presenting them to their pupils. This step is crucial to minimise the risk of misconceptions or inaccuracies, especially concerning physics concepts, which could confuse the class. In cases where responses are generated during lesson time, engaging in a comprehensive discussion about response quality becomes vital. This allows for the identification and discussion of any errors to ensure clarity and accurate understanding.