THE SALLY-ANNE STUDY
Specification: The Sally-Anne Study by Simon Baron-Cohen, Alan M. Leslie, and Uta Frith in 1985. To test the Theory of Mind in children. Theory of Mind refers to the ability to understand that other people have beliefs, intentions, and knowledge that may differ from one’s own. The researchers were particularly investigating whether children with Autism Spectrum Disorder show difficulties in understanding false beliefs.
THE SALLY-ANNE STUDY (1985)
THE RESEARCHERS
SIMON BARON-COHEN: A British clinical psychologist who focuses on autism. Back in the 1980s, he was doing his PhD in psychology at the University of London, with Uta Frith as his supervisor. After that, he became a professor of developmental psychopathology at the University of Cambridge and now runs the Autism Research Centre there. His main work examines how autism affects thinking and brain biology, including theories of mind and differences in how men and women think.
ALAN M. LESLIE: A cognitive psychologist famous for his ideas on how kids develop a theory of mind and handle mental representations. He came up with the key theory used in this study, which is about “second-order representations” – basically, thinking about what someone else is thinking. During the study, he was based at the University of London, and later he became a professor at Rutgers University in the US, where he kept studying how babies think and how pretend play works in early development.
UTA FRITH: A German-British developmental psychologist known for her groundbreaking research on autism and dyslexia. She supervised Simon Baron-Cohen’s PhD and has been a leader in understanding how the brain works in these areas. In the 1980s, she worked at the MRC Cognitive Development Unit in London, and she’s now an emeritus professor at University College London. Her studies push for explanations of autism that aren’t just about overall intelligence, but focus on specific brain processes.
BACKGROUND
In the early 1980s, developmental psychologists were deeply engaged in exploring how children acquire the ability to attribute mental states—such as beliefs, desires, intentions, and knowledge—to themselves and others. This capacity, termed “theory of mind” by Premack and Woodruff in 1978, is essential for navigating social interactions, as it allows individuals to predict and interpret behaviour based on inferred psychological states rather than observable actions alone.
Prior research indicated that very young children, typically under the age of four, operate under an egocentric assumption: they believe that others share their own knowledge and perceptions of the world. As cognitive and social development progresses, children begin to recognise that mental states can differ between individuals, even in identical situations. This shift enables more sophisticated social understanding, such as recognising deception, empathy, or differing viewpoints.
Building on earlier work by researchers like Leo Kanner, who first described autism in the 1940s, and Michael Rutter, who refined diagnostic criteria in the 1970s, Baron-Cohen, Leslie, and Frith hypothesised that children with autism spectrum disorder (ASD) experience a specific deficit in developing this theory of mind. They drew from Alan Leslie’s model of metarepresentational development, which posits that theory of mind relies on the ability to form “second-order representations”—mental representations of other people’s mental representations. In other words, young children initially assume that other people know what they themselves know. As children develop socially and cognitively, they gradually learn that different people can have different beliefs about the same situation. This capacity typically emerges in the second year of life and is also linked to the onset of pretend play, where children represent objects or actions symbolically.
The researchers noted that autistic children often exhibit a notable absence of pretend play, alongside profound social impairments, which could not be fully explained by general intellectual disability. Studies by Hermelin and O’Connor in the 1970s had already shown that while many autistic children are intellectually impaired, their social deficits persist even in those with average or above-average IQs. In contrast, non-autistic children with intellectual disabilities, such as those with Down syndrome, demonstrate social competencies appropriate to their mental age. In other words,
Baron Cohen, Leslie and Frith proposed that children with Autism Spectrum Disorder may have difficulty representing the mental states of others. They suggested that autism might involve a specific impairment in understanding beliefs, intentions and perspectives independent of overall intelligence.
To test this idea, they designed a false-belief task.A false belief occurs when a person believes something that is not actually true. Understanding false belief requires recognising that another person’s belief depends on what they have seen or experienced, not on what is objectively correct.
If children understand false belief, they should be able to predict behaviour based on another person’s mistaken belief, rather than on reality.
AIMS
The researchers sought to distinguish between two possibilities: whether struggling with false belief tasks in autism stems from a broad developmental delay (linked to overall cognitive maturation) or from a targeted impairment in theory of mind – the specific ability to attribute mental states like beliefs and intentions to others. By controlling for factors like verbal mental age and including control groups matched on these aspects, the study aimed to isolate whether the deficit was unique to autism or more generally tied to intellectual ability. This approach helped test the hypothesis that autism involves a selective cognitive issue in understanding others’ perspectives, rather than a uniform developmental lag.
METHODOLOGY
The study used a controlled laboratory experiment using dolls and props to present a short story.
Participants were divided into three groups:
THE CHILDREN WITH AUTISM (20 participants)
THE CHILDREN WITH DOWN SYNDROME (14 participants)
THE TYPICALLY DEVELOPING CHILDREN (27 participants)
The Down syndrome group acted as a comparison group with learning difficulties. If these children passed the task but autistic children did not, this would suggest the difficulty was not simply due to lower intelligence.
Participants were matched as closely as possible on mental age, measured using standardised intelligence tests.
PROCEDURES
Children were individually shown a short sequence using dolls that acted out a simple story. The child watched the sequence and was then asked questions about what would happen next.
First, Sally entered the room. In the room, there was a basket and a box. Sally placed her marble into the basket. She then left the room.
Next, Anne entered the room while Sally was away. Anne removed the marble from the basket and placed it inside the box. Anne then left the room.
Finally, Sally returned to the room.
At this point, the child watching the film knew that the marble had been moved. However, Sally had not seen this happen.
The researchers then asked the child the key belief question
“Where will Sally look for her marble?”
This question tests Theory of Mind. To answer correctly, the child must recognise that Sally will act according to what she believes, not according to what the child knows to be true.
A child who has developed Theory of Mind will understand that Sally did not see the marble being moved. Therefore, Sally will believe the marble is still in the basket and will look there.
A child who has not yet developed Theory of Mind may assume that Sally knows what they know. Because the child knows the marble is now in the box, they may answer that Sally will look in the box. In doing so, they assume that Sally shares their knowledge of the situation.
Two control condition questions were then asked to ensure the child understood and remembered the story.
REALITY QUESTION: “Where is the marble really?”
MEMORY QUESTION: “Where did Sally put the marble in the beginning?”
If a child answered these two questions correctly but failed the belief question, this indicated that the child understood the events but had difficulty recognising that Sally held a false belief about the marble’s location.
RESULTS
The groups showed clear differences.
THE TYPICALLY DEVELOPING CHILDREN: Around 85 per cent answered the belief question correctly. They recognised that Sally would look in the basket, where she believed the marble to be.
CHILDREN WITH DOWN SYNDROME: Approximately 86 per cent also answered correctly. This showed that the task did not simply require high intelligence.
CHILDREN WITH AUTISM: Only around 20 per cent answered correctly. Most autistic children said Sally would look in the box, where the marble actually was. Importantly, most autistic participants answered the REALITY QUESTION and MEMORY QUESTION correctly, showing that they understood the story and remembered the events. Their difficulty appeared specifically related to representing another person’s belief.
CONCLUSION
The researchers concluded that many children with autism experience difficulty understanding that other people can hold beliefs that differ from reality. Baron Cohen, Leslie and Frith suggested that autism may involve a deficit in THEORY OF MIND, sometimes referred to as MINDBLINDNESS, meaning difficulty understanding other people’s thoughts and beliefs.
This impairment could help explain some of the social communication difficulties often associated with autism.
REPLICATION
The Sally-Anne task has become one of the most influential methods for studying Theory of Mind and has been widely replicated.
Subsequent research has shown that most typically developing children pass false-belief tasks by around 4 years of age, whereas younger children often fail because they assume others share their knowledge.
Many children with autismcontinue to find false-belief tasks difficult, even when matched for mental age.
Later studies introduced variations such as unexpected content tasks and second-order false belief tasks, which further explored how children understand other people’s beliefs and perspectives.
UNEXPECTED CONTENT TASK - SMARTIES TEST (1983) Heinz Wimmer and Josef Perner
The unexpected content task was designed to test false-belief understanding, a core component of the theory of mind. It was developed as an alternative method to the classic Sally-Anne False Belief Test because researchers wanted a task that did not rely on tracking the movement of objects between characters.
In this task, a child is shown a familiar container that normally contains a particular item. For example, a Smarties tube or a sweets box. When the container is opened, it unexpectedly contains something else, such as pencils.
Researchers then ask two critical questions:
What did you think was inside before we opened it?
What will another person think is inside if they have not seen the contents?
The task was designed to examine whether children understand that beliefs are based on information available to a person, not on reality. Young children often answer that the other person will say “pencils,” because they project their own knowledge onto others. Older children recognise that the other person will believe the expected content (for example, sweets), because that person lacks the new information.
The task, therefore, measures whether children can distinguish between their own knowledge and another person’s belief. It provided further evidence that the theory of mind typically develops around age 4, consistent with findings from the Sally-Anne task.
SECOND ORDER FALSE BELIEF TASK (1985) Simon Baron-Cohen, Alan Leslie and Uta Frith)
Second-order false-belief tasks were designed to examine a more advanced level of theory of mind. While first-order tasks ask whether a person understands that someone can hold a false belief, second-order tasks ask whether a child understands what one person believes about another person’s belief.
In other words, the child must reason about beliefs about beliefs.
A typical example involves two characters. One character hides an object and believes it remains there. Another character secretly moves the object while the first character is absent. A second layer is then introduced: one character may observe the movement without the other knowing.
The key question might be:
“What does John think that Mary believes about where the object is?”
To answer correctly, the child must track two mental states simultaneously. They must understand:
• what Mary believes about the object
• what John believes about Mary’s belief
These tasks were developed because many children who passed first-order false-belief tasks still struggled with more complex social reasoning. Second-order tasks, therefore, allowed researchers to examine the developmental progression of theory of mind, showing that understanding nested mental states typically emerges later, around the age of six to seven years.
The tasks are particularly important in research on social cognition and autism because difficulties with second-order belief reasoning reveal limits in understanding complex interpersonal perspectives, deception, irony, and sarcasm.
THE STRANGE STORIES TEST (1994) , Francesca Happé (1994)
To investigate how autistic individuals interpret such non-literal language, Francesca Happé (1994) developed the Strange Stories Test. In this task, participants read short stories that involve non-literal communication, such as irony or sarcasm, and are asked questions about the meaning of what a character says.
One example involves a story in which a mother spends a long time cooking her daughter Ann’s favourite meal. When the food is served, Ann continues watching television and does not thank her. Ann’s mother responds sarcastically by saying, “Well, that’s very nice, isn’t it. That’s what I call politeness.”
Participants are asked two questions. First, whether the statement is literally true. Second, why does the mother say it?
Research using this task has found that autistic participants often recognise that the statement is not literally true. However, many have difficulty explaining the intention behind the comment. Instead of recognising the sarcasm, they may suggest alternative explanations, such as the mother joking.
This suggests that difficulties in theory of mind may make it harder to interpret the intended meaning behind non-literal language. As a result, expressions involving irony, sarcasm, or indirect communication may appear confusing when interpreted purely at face value.
EVALUATION OF THE SALLY-ANNE STUDY
EYE GAZE AND THEORY OF MIND IN AUTISM
Rhiannon Ruffman, Alan Garnham, and Paul Rideout (2001) further examined the relationship between autism and performance on the Sally-Anne False Belief Test by focusing on eye gaze as a social communicative cue. Their study introduced a modification to the original task by adding a third possible location for the marble, namely the investigator’s pocket.
Both autistic children and children with moderate learning disabilities were tested using this expanded version of the task. The researchers found that both groups performed similarly on the belief question, indicating they could verbally identify where Sally would search for the marble.
However, differences emerged when researchers examined spontaneous eye gaze behaviour. Children with moderate learning disabilities tended to look towards the correct location where Sally would search, demonstrating that their visual attention aligned with their understanding of Sally’s belief. In contrast, autistic children frequently failed to direct their gaze towards the correct location, even when they answered the belief question correctly.
This suggests that although some autistic children can produce the correct verbal answer, they may still show atypical social attention and difficulties using eye gaze as a communicative signal, reflecting broader social cognitive differences associated with autism.
UNCERTAINTY ABOUT THE THEORY OF MIND AS A COMPLETE EXPLANATION
Helen Tager-Flusberg (2007) later argued that, despite influential findings from tasks such as the Sally-Anne test, there remains considerable uncertainty about whether deficits in theory of mind alone fully explain autism. Across many studies, a proportion of autistic children successfully pass false belief tasks, including the Sally-Anne task.
This observation suggests that, while theory-of-mind difficulties are common in autism, they cannot provide a complete explanation for the condition. The presence of autistic individuals who perform successfully on false belief tasks indicates that other factors, such as language ability, executive functioning, or broader social processing differences, may also play significant roles.
FALSE BELIEF UNDERSTANDING IN OTHER HOMINIDS
Research using eye-tracking methods has suggested that some nonhuman great apes may possess rudimentary abilities related to the theory of mind. Studies examining chimpanzees, bonobos, and orangutans have investigated whether these species can anticipate the behaviour of an individual who holds a false belief, a cognitive ability traditionally thought to be uniquely human.
In these experiments, apes watched short filmed scenarios involving a human actor dressed in a King Kong-style costume. The character hid an object in one location, after which another character moved the object while the first was absent. This structure mirrors the logic of the Sally-Anne False Belief Test, where a subject must predict where an individual will search based on their belief rather than the object’s actual location.
Instead of answering questions verbally, the apes’ anticipatory eye movements were measured. Eye tracking allowed researchers to determine where the animals looked just before the actor returned to search for the object. If the apes looked towards the location where the actor falsely believed the object to be, this suggested that they anticipated behaviour based on the actor’s belief rather than reality.
Results indicated that chimpanzees, bonobos, and orangutans tended to look towards the location consistent with the actor’s false belief. This pattern implies that they may anticipate actions based on what another individual believes rather than on what is true.
However, the interpretation remains debated. The apes did not answer belief questions verbally and therefore did not literally “pass” the Sally-Anne task in the same way human children do. Instead, their eye movements suggest a behavioural sensitivity to others’ beliefs, which may represent an evolutionary precursor to the fully developed theory of mind observed in humans
INTERACTIONAL FACTORS AND THE VALIDITY OF THE SALLY ANNE TASK
One evaluation of research using the Sally-Anne False Belief Test concerns the validity of the task as a measure of theory of mind. This criticism does not reject the theory-of-mind concept itself, but suggests that the task used to measure it may not always accurately reflect children’s understanding of mental states.
Research using video and conversation analysis has shown that children’s performance on the Sally-Anne task can be influenced by interactional factors between the tester and the child. In this work, children sometimes responded correctly by pointing to the correct location or manipulating objects rather than giving a clear verbal answer. However, testers often treated these responses as inadequate because they did not match the expected response format.
When children produced quiet or nonverbal responses, the tester frequently repeated the question, indicating that the previous response was not acceptable. In some cases, this led children to alter their answer. As a result, the final recorded response may not always reflect the child’s initial understanding.
This suggests that performance on the task may depend partly on how the child interprets the question, how the tester evaluates responses, and the interaction between the two participants. The task may therefore measure communication style and interactional dynamics rather than theory-of-mind ability alone
The implication is that false belief tasks, such as the Sally-Anne test, may lack internal validity. Children’s answers can be shaped by interactional cues and expectations during the assessment rather than purely by their understanding of another person’s beliefs. Consequently, incorrect answers may not always indicate a genuine deficit in theory of mind
LIMITATIONS OF FALSE BELIEF TASKS IN ARTIFICIAL INTELLIGENCE
Research on artificial intelligence has shown that advanced language models can successfully answer classic false-belief problems based on the Sally-Anne False Belief Test. For example, studies such as those conducted by Michal Kosinski (2024) report that models like GPT-4 can solve many false-belief tasks at a level comparable to that of six-year-old children. Other research has found that such models can also perform well on tasks involving indirect requests, misdirection, and higher-order reasoning about beliefs.
However, this evidence does not necessarily demonstrate that artificial systems genuinely understand mental states. Critics argue that language models may simply detect patterns in task wording rather than reason about beliefs the way humans do. For example, Tom Ullman (2023) found that earlier models, such as GPT 3.5, failed when the wording of false-belief tasks was slightly altered, even though humans can easily adapt to such changes.
Further research suggests that language models are much better at identifying what a character knows or believes than predicting behaviour based on those beliefs. This suggests that successful performance on false-belief questions may reflect pattern recognition rather than genuine theory-of-mind reasoning.
As a result, some researchers argue that passing isolated false-belief tasks is not sufficient evidence of genuine social cognition. More complex tests involving intentions, emotions, deception, and non-literal communication are now being used to examine whether artificial systems truly understand mental states or simply reproduce patterns learned from language data.
FALSE BELIEF TASKS MAY NOT BE A VALID MEASURE OF THEORY OF MIND
A key criticism of the theory of mind research was proposed by Paul Bloom and Tim P. German (2000). They argue that false-belief tasks, such as the Sally-Anne False Belief Test, should not be treated as a definitive test of theory of mind.
First, they argue that passing the false belief task requires abilities beyond theory of mind. To answer correctly, children must remember the sequence of events, track two characters, understand the language of the question, and inhibit the obvious answer based on reality. These additional cognitive demands mean that failure may reflect limitations in memory, language comprehension, or executive functioning rather than an absence of theory of mind.
Second, Bloom and German argue that the theory of mind does not necessarily require success on false belief tasks. Evidence suggests that even very young children show signs of understanding other people’s mental states before they can pass these tests. For example, young children can follow eye gaze, imitate intended actions, engage in pretend play, and modify their communication depending on what another person knows or has seen.
This suggests that the false belief task measures only one specific and cognitively demanding aspect of social reasoning. As a result, failure on the task may not indicate a genuine lack of theory of mind, but rather the difficulty of the task itself. Bloom and German, therefore, conclude that false belief tasks should be treated as only one tool among many for investigating social cognition.
CRITIQUE OF THE THEORY OF MIND TEST
Some critics have questioned the interpretation of results from theory-of-mind tasks such as the Sally-Anne False Belief Test. They argue that a child’s answer to the task may not necessarily indicate a failure to understand false belief.
One criticism is that responses may reflect differences in sensory processing or expectations about how the world behaves rather than an absence of a theory of mind. For example, an autistic child might assume that objects are frequently moved or changed because their experience of the sensory environment is more unpredictable or overwhelming. From this perspective, it may be reasonable for the child to think that Sally would search elsewhere than where she originally placed the marble.
This suggests that the task may partly measure how individuals interpret stability and change in the environment rather than purely their ability to understand other people’s beliefs. The “correct” answer in the task may therefore reflect assumptions made by neurotypical researchers about how people normally expect objects to remain where they were left.
As a result, some critics argue that performance on false belief tasks may sometimes reflect differences in perception, communication style, or sensory experience rather than a genuine absence of theory-of-mind ability.
INTERACTIONAL FACTORS AND THE VALIDITY OF THE SALLY ANNE TASK
One evaluation of research using the Sally-Anne False Belief Test concerns the validity of the task as a measure of theory of mind. This criticism does not reject the theory-of-mind concept itself, but suggests that the task used to measure it may not always accurately reflect children’s understanding of mental states.
Some researchers argue that children’s performance may be influenced by interactional factors between the tester and the child. Research using video and conversation analysis has shown that children sometimes responded correctly by pointing to the correct location or manipulating objects rather than giving a clear verbal answer. However, testers often treated these responses as inadequate because they did not match the expected response format.
When children produced quiet or non-verbal responses, the tester frequently repeated the question, signalling that the previous response was not acceptable. In some cases, this led children to modify their answer. As a result, the final recorded response may not reflect the child’s initial understanding.
This suggests that performance on the task may depend partly on how the child interprets the question, how the tester evaluates responses, and the interaction between the two participants. The task may therefore measure not only theory-of-mind ability but also communication style, response format, and the dynamics of the testing situation.
Indeed, some researchers argue that these findings raise broader concerns about the internal validity of the Sally Anne task. Children’s answers may be shaped by interactional cues and expectations during the assessment rather than purely by their understanding of another person’s beliefs. Consequently, incorrect responses may not always indicate a genuine deficit in theory of mind
SUPPORTING EVIDENCE FOR THE THEORY OF MIND DEFICITS IN AUTISM
Research using false belief tasks provides support for the theory of mind explanation of autism. For example, the Sally-Anne False Belief Test developed by Simon Baron-Cohen, Alan Leslie and Uta Frith (1985) found that around 80 per cent of autistic children failed the task, compared with most typically developing children and children with Down’s syndrome who passed. This suggests that autistic children may have specific difficulties understanding that other people can hold false beliefs.
However, the support is not universal because some autistic individuals pass false-belief tasks, suggesting that theory of mind deficits may not explain all cases of autism.
THEORY OF MIND DEFICIT MAY NOT BE UNIVERSAL IN AUTISM
Some autistic individuals are able to pass false-belief tasks. This suggests that theory of mind deficits are not present in all individuals with autism, particularly those with higher verbal ability or milder autistic traits. Therefore, theory of mind difficulties may explain some aspects of autism, but cannot fully account for the condition.
FALSE BELIEF TASKS MAY MEASURE LANGUAGE AND MEMORY RATHER THAN THEORY OF MIND
Some researchers argue that tasks such as the Sally Anne test require several additional abilities, including language comprehension, memory, and attention. Children must follow the story, remember where objects were placed, and understand the wording of the question. If a child fails the task, this may reflect difficulties with language or cognitive processing rather than a genuine lack of theory of mind.
EVIDENCE FROM OTHER TASKS SUPPORTS THE THEORY OF MIND DIFFICULTIES
Other studies support the theory-of-mind explanation by showing that autistic individuals struggle to understand non-literal communication. For example, Francesca Happé (1994) developed the Strange Stories Test, which measures understanding of sarcasm, irony, and deception. Autistic participants often recognise that a statement is not literally true but struggle to explain the speaker’s intention. This supports the idea that difficulties understanding other people’s mental states may underlie some autistic behaviours.
THE DOUBLE EMPATHY PROBLEM AND CRITICISMS OF THE THEORY OF MIND DEFICIT
A more recent critique of the theory-of-mind explanation of autism is known as the double empathy problem, proposed by Damian Milton in 2012 and widely discussed in contemporary autism research. This perspective challenges the traditional interpretation that autistic individuals lack a theory of mind.
The theory of mind account suggests that autistic people struggle to understand others' thoughts, beliefs, and intentions, leading to social communication difficulties. However, the double empathy framework argues that misunderstandings between autistic and non autistic individuals may be mutual rather than one-sided. In other words, communication difficulties may arise because people with different cognitive styles interpret social cues differently.
Research has shown that autistic individuals often communicate effectively with other autistic individuals, suggesting that social understanding may depend partly on shared communication styles and expectations rather than a simple deficit in understanding mental states. From this perspective, difficulties observed in theory-of-mind tasks may reflect a mismatch between autistic and neurotypical ways of interpreting behaviour rather than a complete inability to understand others.
This critique suggests that the traditional interpretation of false belief tasks may be overly narrow. Instead of demonstrating a universal deficit in theory of mind, performance differences may reflect differences in social experience, communication styles, and expectations during interaction.
Consequently, some contemporary researchers argue that theory of mind difficulties may explain only part of the social differences associated with autism, and that social understanding may be shaped by reciprocal interaction between individuals with different cognitive profiles
