PART THREE: A BEGINNER’S GUIDE TO INFERENTIAL STATISTICS.

SPECIFICATION:

  • Students should demonstrate knowledge and understanding of inferential testing and be familiar with inferential tests.

  • Levels of measurement: nominal, ordinal and interval.Introduction to statistical testing: the sign test. When to use the sign test; calculation of the sign test.

  • Probability and significance: use of statistical tables and critical values in interpreting significance; Type I and Type II errors.

  • Factors affecting the choice of statistical test, including level of measurement and experimental design. When to use the following tests: Spearman’s rho, Pearson’s r, Wilcoxon, Mann-Whitney, related t-test, unrelated t-test and Chi-Squared test.

If you have followed my posts on data analysis & handling PART 1 and PART 2, then you should be familiar with the following:

  • Quantitative and Qualitative Data

  • Continuous and Discreet Data

  • Levels of Measurement: Nominal, Ordinal, Interval & Ratio Data

  • Descriptive Statistics: Measure of Central Tendency & Dispersion

  • Graphs and Tables.

If you are unfamiliar with the above terms, please investigate them before you carry on with this section, or you will find it difficult to fully understand the reasoning behind the decision-making process in inferential statistics.

KEY TERM:

The word INFERENTIAL is characterised by the conclusions reached based on evidence and reasoning.

INFERENTIAL STATISTICS

After descriptive statistics, researchers must perform an inferential test on their data.

Here are a few examples of inferential statistical tests selected for their relevance to introductory psychology courses. Remember, this list is not exhaustive; many additional tests are available and widely used in the field.

  • The Sign Test For details about calculating the sign test, click here: SIGN TEST.

  • Mann-Whitney U

  • Unrelated T Test

  • Wilcoxon Matched pairs

  • Related T Test

  • Spearman’s Rho

  • Pearson’s Product

  • Chi-Square

WHAT INFERENTIAL TEST SHOULD YOU CHOOSE?

Before selecting an inferential statistical test for your research, it's crucial to understand seven key aspects of your data and research objectives. Without knowing these, a test cannot be selected.

These criteria will seem puzzling now but there's no cause for concern as each one will be throughly explored individually to ensure a comprehensive understanding is achieved. By the end of the inferential statistics chapter, you'll understand these seven key points and their importance to choosing tests.

CRITERIA NEEDED TO CHOOSE AN INFERENTIAL TEST:

  1. What is your level of significance, e.g., 001, 0.05, or 0.10 (1%, 5%, 10%)?

  2. What is your level of measurement of your data (e.g., nominal, ordinal, interval, or ratio)?

  3. Is the hypothesis directional (1-tailed) or non-directional (2-tailed)?

  4. Do you need a test of difference, association or correlation?

  5. If you need a test of difference, what design are you using: Independent Group, Matched Pairs or Repeated Measures?

  6. What is the size of your sample? For example, does it have less than 30 participants?

  7. If your data is parametric or not

NOW, LET’S LOOK AT EACH OF THESE CRITERIA IN GREATER DETAIL…

PROBABILITY AND CHANCE

Probability is the branch of maths that calculates how likely the occurrence of an event is. The probability of an event, mathematically, is a number between 0 and 1. In short, 0 indicates the impossibility of the event occurring; for example, I could bet on the zero possibility of turning into a reptile by sunrise.

1 indicates the certainty of an event - I can’t think of many events that would occur with absolute certainty. - even death - maybe it’s possible that our memories could get transferred into a form of A. I sometime in the future. I’ll go with, I’m 100% certain the sun will rise tomorrow.

In reality, most events have probabilities between 0 and 1 because most events can’t be predicted with absolute certainty or uncertainty.

TAKEAWAYS

  • Probability Is expressed as p.

  • Probability, or p, is expressed as a number between 0 and 1

  • P = 0 means an event won’t happen, e.g., you can’t pick a joker if there are none in the pack

  • P = 1 Means that an event will happen, e.g., you pick a joker from a pack of jokers.

  • P = 0.50 Heads or tails

  • P = 0.25 Picking a heart card from a full deck of cards.

  • Scientists look at conditional probability: this is the probability of an event if something else occurs; for example, there is a chance anyone can develop hair loss, but this will increase if the person is male and above 40 years old (the conditions).

  • The reason probability is between 0 and 1 is the way that it is calculated.

To calculate the probability that a particular outcome will occur, it has to be divided by the number of possible outcomes. Probability = Total number of outcomes in which X happens. Total number of possible outcomes, e.g. heads and tails. The Probability of getting a head is one divided by two = .5 because there are only two possible outcomes.

WHAT’S THE PROBABILITY?

  • Finding Mutual Love: 1 in 562 chance

  • Having Twins: 1 in 67 chance

  • Becoming a Millionaire: 1 in 55 chance

  • Winning the Lottery (Weekly Ticket): 1 in 100,000,000 chance

  • Involved in a Drunk Driving Crash (Lifetime): 2 in 3 chance

  • Going Blind After Laser Eye Surgery: 1 in 5,000,000 chance

  • Being Injured by a Toilet: 1 in 5,000 chance

  • Dying in a Motor Vehicle Crash: 1 in 102 chance

  • Dying in an Airplane Crash: 1 in 205,552 chance

  • Dying from a Hornet, Wasp, or Bee Sting: 1 in 54,093 chance

  • Being Hit by Lightning: 1 in 114,195 chance

  • Killed by Fireworks: 1 in 340,733 chance

  • Dying from a Shark Attack: 1 in 3,748,067 chance

  • Scoring a Hole-in-One in Golf: 1 in 5,000 chance

  • Becoming an Astronaut: 1 in 13,200,000 chance

  • Winning an Olympic Medal: 1 in 662,000 chance

  • Getting Stuck in a Lift (Per Ride): 1 in 100,000 chance (USA data)

HOW DOES PROBABILITY RELATE TO INFERENTIAL STATISTICS?

Consider discovering a bag of fifty-pound notes on the way to work or school. Such an event is likely a rare occurrence—a one-off instance. Similarly, researchers question whether they would obtain consistent results with similar populations.

For example:

SCENARIO 1

A psychologist creates the following hypothesis: "Participants in the jogging condition will rate photographs of the opposite sex higher than participants in the non-jogging condition” Soon afterwards, the psychologist conducts the experiment and gets the following descriptive data.

TABLE ONE

Descriptive data will summarise your raw data and make it easier to analyse. It will also tell you if your experiment or study has worked.

However, the research process extends beyond descriptive statistics. For instance, the findings presented in Table 1 are not guaranteed to happen again. The results could be by chance. Researchers need to know that other psychologists could obtain the same results when replicating the jogging study.

Here is another analogy to really illustrate the point.Suppose a coin is flipped twenty times, and the outcomes of heads and tails are recorded.

The results are as follows.

Q. What is the probability of getting 19 heads and one tail every time a coin is flipped 20 times?

A. The probability of getting 19 heads and one tail every time a coin is flipped 20 times is extremely unlikely. This outcome would typically be considered a chance or random result.

Descriptive statistics alone cannot address these questions because they do not facilitate the calculation of the probability of a result recurring or never occurring again

Without conducting an inferential test, it is impossible to calculate the probability of results occuring again.

WHY DO WE NEED INFERENTIAL TESTS?

This is where inferential statistics come in as they calculate the probability of a result occurring again or not; in other words, is the finding a chance result, or would it occur again and again? The ultimate goal of inferential statistics is to generalise findings from the sample in the study to similar populations. This is because samples are often biased, so it’s important to find out how likely it is that they accurately reflect what happens in the population.

Scientists call this aspect of the research process statistical significance, so if a phenomenon doesn’t occur by chance, it is deemed statistically significant. If it occurs by chance, it is deemed not statistically significant.

NB: Most scientists call it “significant” or “not significant” - the prefix “statistical” is used with less frequency now.

WHAT IS A SIGNIFICANCE LEVEL?

Most things in life require a certain amount of evidence before people try them, for example, taking the Covid vaccination. If any behaviour were deemed too risky, e.g., if 90% of people exploded when travelling on aeroplanes, then the likelihood of people flying would be zero.

We use this evidence-based system frequently in our daily lives; for example, our criminal justice system rests upon the notion that prosecutors must prove the defendant is guilty beyond a reasonable doubt; judgements are very stringent and require strong evidence, e.g., at least 90%, of the jury, must agree to find a defendant guilty or not guilty. There are protocols for evidence because convicting innocent people is ethically wrong.

For science, the significance level is the jury because it tells the researcher the likelihood of obtaining a chance result. Scientific research will only be deemed credible if statisticians convince the world that their results can be applied to the population and are not random occurrences.

More specifically, the significance level is the percentage of chance a researcher is willing to allow. For example, If I am 99% certain the sun will rise tomorrow, I am leaving a 1% possibility to chance that it won’t - the percentage I am willing to concede, e.g. the 1%, is the significance level.

  • CONFIDENCE LEVEL: The level of certainty about something, e.g., I am 100% certain I am female. The confidence level in scientific research also assesses the probability that the results will be the same if research is repeated.

  • SIGNIFICANCE LEVEL ( p-value, also known as alpha, is the probability that the null hypothesis is true ) It is the degree of uncertainty you are willing to concede about the results of your study, e.g., if nothing is 100% certain, what percentage of chance will you allow? More formally, the significance level is the maximum risk of rejecting a true null hypothesis you are willing to take, usually set at 5% but can also be 1% & 10%.

  • Incidentally, confidence levels and significance levels go hand in hand, e.g., if I have a 95% confidence level that my dog is a thoroughbred Golden Retriever, then I must acknowledge there is a 5% chance (significance level )that he is not, e.g., he has DNA from other breeds.

It is important to know that the confidence level is usually set at 90% or higher, e.g., 95% and 99%. The corresponding significance levels would then be 10%, 5% and 1%.

TYPES OF SIGNIFICANCE LEVEL

Researchers could choose a zero per cent significance level. Still, it isn't likely that they would ever be able to make any prediction with that level of confidence, as it is equivalent to predicting 100% certainty/confidence. Theoretically, all phenomena have unpredictability; predicting any behaviour with 100% confidence would not be possible. One might be able to predict certain behaviours with near 100% conviction, such as whether a person would stick their hand into a bucket of spitting rattlesnakes. Still, there is always the possibility that one of the participants in your sample could be a bit unconventional, so you need to factor in chance.

Whatever significance level a psychologist chooses, the results will never be free of chance. So, if a psychologist cannot predict a 0% chance level, what can they predict? Not many women would take the contraceptive pill if the chance of getting pregnant were 50%. What percentage of uncertainty should the scientist allow, or in statistic talk, “what should the desired strength of evidence be against the null hypothesis?”

SIGNIFICANCE LEVEL PERCENTAGES

  1. The 0% significance level means that there is a 0% chance that the results occurred randomly. And a 100% confidence level that they didn’t occur by chance. No researcher chooses this.

  2. The 1% significance level means that there is a 1% chance that the results occurred randomly. And a 99% confidence level that they didn’t occur by chance. 1% is also expressed as 0.01 or 1 chance in 100.

  3. The 5% significance level means that there is a 5% chance that the results occurred randomly. And a 95% confidence level that they didn’t occur by chance. 5% is also expressed as 0.05 or 1 chance in 20.

  4. The 10% significance level means that there is a 10% chance that the results occurred randomly. And a 90% confidence level that they didn’t occur by chance. 10% is also expressed as 0.10 or 1 chance in 10.

TYPE ERRORS

Type I and II errors

  • Type I error: you conclude that 10 minutes of meditation reduces stress when it doesn’t.

  • Type II error: you conclude that spending 10 minutes of meditation doesn’t affect stress when it does.

TYPE I ERROR

If a researcher opts for a 10% significance level in a test, they aim to predict outcomes with 90% certainty and allow for a 10% chance of error. However, accepting an experimental hypothesis at this significance level implies only 90% certainty, which raises concerns about credibility. Would you board an aeroplane if there was a 10% chance that 10 out of 100 passengers wouldn't make it?

Why do some researchers choose a 10% significance level when it lacks integrity? Would the general public find such a high chance margin acceptable for evaluating a new drug? Some psychologists may choose this level because it's easier to prove. For instance, a researcher might explore a new area and conduct preliminary research to assess differences or associations between variables, such as investigating whether vaping causes vivid dreams.

However, when a researcher sets a 10% significance level, they're not aiming to prove results at the most stringent level, as this might overlook nuances in the data. Opting for a more stringent significance level ensures that the test is highly sensitive to true effects, including very subtle ones, such as vivid dreams in a percentage of vape smokers. Conversely, setting the chance bar too low may result in statistically significant findings with little real-world usefulness. For example, if the significance level is set at 1%, the link between smoking and cancer might go undetected.

However, the 10% significance level is problematic because there is a 10% chance the researcher will end up not rejecting their null hypothesis when they should have - because the significance level was not stringent enough. In other words, If the significance level is set at 10%. This means that if 100 studies are conducted, only 90% of the results will be true results, and 10 per cent will happen by chance. This is known as a type I error.

A type I error occurs in situations where a researcher will have concluded that the results are statistically significant when, in fact, they are not - this can also be referred to as a false positive.

In simpler terms, the same principle applies to everyday situations with low significance, like birth control pills. For instance, if the confidence level for not getting pregnant while on birth control pills is around 90% (or 99% if used perfectly), it means that out of 100 women, about ten could still get pregnant while using the pill. When contraception fails, it's obvious that a mistake occurred because a woman becomes pregnant. However, researchers won't realize they made a mistake until they redo the study and get different results or notice that their test value is significantly different from the standard values of 5%, 10%, or even 1% (but more on that later).

Here are some other examples of false positives:

Lateral flow tests: A lateral flow test with a 90% accuracy rate means that ten per cent of test takers will falsely show positive results. This could lead to individuals being mistakenly identified as positive for a certain condition or disease.

  1. Legal Trials: In a legal trial, such as the trial of an accused criminal, a type I error would mean that the person is not found innocent and is sent to jail despite actually being innocent. This can have severe consequences for the wrongly accused individual.

  2. Western Blot test for Lyme disease: The Western Blot test detects tick-borne bacterial infections like Lyme disease. It has a false-positive prevalence of 4.8%, meaning that 4.8% of people test positive for Lyme disease when they don’t have it. This can lead to unnecessary treatments or anxiety for individuals who receive false positive results.

TYPE II ERROR

  • If a researcher chooses a 1% per cent significance level test, they want to predict with 99% certainty and a 1% chance rate.

  • If an experimental hypothesis is accepted with a 1% significance level, it is a highly credible result as it means 99% certainty.

SCENARIO 2

Imagine you have been stuck on a mountain for three days. Miraculously, three rescuers appear on the evening of day 3.

Q. Which one do you choose?

I hope you opted for Ivan the nihilist despite his intimidating presence since he would most likely rescue you. Similarly, the 1% significance level carries substantial credibility, so why don't all researchers opt for it? Would the public be content with anything less stringent? Who would prefer the 10% error margin akin to 'Mountain Mick's rescue operations?

Many psychologists steer away from the 1% significance level because it's not just exceptionally challenging to achieve but also overly stringent, making it difficult to detect nuanced differences in the data.

A good analogy for these points is dart players.

SCENARIO 3

Imagine you are the new team leader of a local darts team. There is an important league game next week, and you want to show off your leadership skills to the existing players. Unfortunately, five members can’t make the league game as they have been stuck on a mountain for three days in the French Alps. In desperation, you advertise for new players. You decide the criterion for membership is players who can get ninety-nine bull’s eyes in a row. One hundred people apply, but sadly, nobody passes this benchmark, and you fail to find new players and have to withdraw from the league. People assume you are a terrible team leader, so you resign with immediate effect and move to the Outer Hebrides.

The reality was that 25 applicants were professional dart players and thus extremely good. But the test was impossibly hard - even for them. You failed to find players because the bar was set way too high. This is known as a type II error. Similarly, If a researcher sets a 1% significance level, it may be too severe to prove. In other words, they accepted the null hypotheses because their criteria were impossibly high.

A type II error occurs when a researcher incorrectly accepts the null hypothesis, failing to recognize the validity of the experimental or alternative hypothesis. This mistake leads to a false negative outcome, where significant findings are dismissed as non-significant.

The challenge with type II errors is that they often go unnoticed until the study is replicated or if further analysis reveals that the observed data points are near the accepted critical values. For example, after the challenge, you realise that even though one player called Alex didn't hit the 99th bull's eye, his ability to hit 98 in a row was an extraordinary feat that almost no one else could achieve. This moment of realisation comes when you witness another competition where the standard is set more realistically, and you see Alex outperforming many others who couldn't come close to his previous mark of 98 bull's eyes.

In this context, recognizing the type II error comes from seeing Alex's performance in a different light and understanding that your original criterion was unrealistically high. The error becomes evident when you observe that Alex's skill level significantly surpasses what is typically expected, even though it fell just short of your original, overly stringent requirement.

Similarly, in statistical terms, you might realize a type II error has been made when subsequent analysis or additional studies show that the effect you deemed non-significant (due to not meeting a very low p-value threshold) has practical significance or when you observe that the results were very close to your significance cutoff, suggesting that a slight adjustment in your criteria (e.g., a more lenient significance level) could have led to a different conclusion.

Thus, recognizing a type II error often involves hindsight—seeing the results in a new context or with a broader perspective highlighting the practical significance of the initial dismissal.

This realisation highlights the delicate balance required to set significance levels to accurately detect true effects without overly restrictive..

FALSE NEGATIVES
A false negative arises when a test incorrectly indicates the absence of a condition or attribute that is present. For example, if you take a pregnancy test that shows you're not pregnant when you are, that's a false negative. Reasons for a false negative in this context might include conducting the test prematurely, diluting urine, a malfunctioning test kit, or reading the results too quickly. This issue isn't confined to medical or physiological tests, such as those for pregnancy or COVID-19, where a person might test negative despite being infected; it extends into various other domains:

  • In manufacturing quality control, a false negative would occur if a flawed product mistakenly passes inspection and is deemed safe or meets quality standards, potentially due to oversight or inadequate testing processes.

  • Within the justice system, a false negative happens if someone guilty is acquitted and deemed not guilty, perhaps because of insufficient evidence, witness testimony issues, or legal technicalities.

These examples highlight false negatives' broad impact and risk, underscoring the importance of accurate testing and decision-making processes across different fields.

THE FIVE PERCENT SIGNIFICANCE LEVEL

Type errors can occur at any significance level, illustrating the inherent risk in statistical testing. For example, Type I errors, which involve falsely rejecting a true null hypothesis, can still occur at a stringent 1% significance level. This is analogous to the scenario with contraceptive pills: even when taken correctly, with a 99% efficacy rate, there's still a 1% chance of pregnancy, highlighting the possibility of error despite high confidence levels.

The choice of significance level significantly influences the probability of encountering Type I or Type II errors. A more lenient p-value, like 0.10, raises the chance of a Type I error, leading researchers to assert significance in their results incorrectly. Conversely, stricter levels like 1% can predispose researchers to Type II errors by overlooking genuine effects.

This dynamic is why many psychologists prefer a 5% significance level, positioning it as a middle ground between being too lenient and strict. The 5% threshold is chosen to reduce the likelihood of both Type I and Type II errors, aiming for a balanced approach that mitigates the risk of making either error, thus enhancing the reliability and validity of research findings.

CHOOSING A SIGNIFICANCE LEVEL

Consequences of Increasing the Significance Level: Imagine you’re testing the strength of bin bags and choosing the 10 per cent significance level. You’ll use the test results to determine their strength. A false positive here leads you to endorse bin bags that are not stronger. The drawbacks of a type I error here are very low because poor-quality bin bags don’t generally cause harm. So, you increase the evidence you need by changing the significance level to 1%. Because this change increases the required evidence, it makes your test more sensitive to detecting differences and increases the chance of type II errors. However, your bin bags never go on the market because they can't pass the stringency test. To avoid going bankrupt, you opt for a five per cent significance level to balance the risk of making type I and II errors.

Consequences of Decreasing the Significance Level:

Conversely, imagine you’re testing the success of a new antidepressant. A type I error here is risky because people’s mental well-being is on the line! You want to be very confident that the antidepressant from one manufacturer is better than the other. In this case, you should increase the evidence required by changing the significance level to 1%. Because this change increases the required evidence, it makes your test less sensitive to detecting differences and decreases the chance of type I errors, but to you, the risk is worth it as people could die. It’s all about the trade-off between sensitivity and false positives! The smaller the significance level p, the more stringent the test, and the greater the likelihood that the conclusion is correct. Unless research is socially sensitive or threatens the safety of an individual, most researchers opt for a 5 per cent significance level because it strikes a balance between being stringent enough to provide reliable results and being practical enough to avoid missing potentially important findings.

FACTORS THAT DETERMINE THE CHOICE OF A SIGNIFICANCE LEVEL

  • Sample Size: A "too small" sample size could be, for example, studying the effect of a new educational method on math performance with only ten students. This is considered small because it may not accurately represent the population's variability, leading to higher uncertainty in the results. Researchers might choose a less stringent significance level (e.g., 10%) to avoid missing a potentially real effect due to this variability.

  • Estimated Size of the Variable Being Tested: This refers to the size of the expected change or effect. For example, if a new drug is expected to lower blood pressure slightly, the effect size is small. Researchers might choose a significance level that balances the need to detect small, meaningful changes without being overly strict, considering the practical significance of the findings.

  • Newness of Research: Emerging fields, like the study of gut health's impact on mental health, may have less existing research to build upon. In such cases, a 10% significance level might encourage exploration and discovery, acknowledging that early findings can guide future, more detailed research.

  • Non-directional (2-tailed) Hypotheses: If researchers are studying a new therapy's effect on depression without predicting whether it will increase or decrease symptoms, they're using a non-directional hypothesis. A 10% significance level might be applied here to remain open to detecting any significant effect, regardless of its direction.

  • Existing Research Support: Well-established findings, such as the decline in memory function with age, might lead researchers to use a more stringent 1% significance level in studies exploring this phenomenon further. This higher standard ensures that new findings truly add to the existing body of evidence.

  • Conflicting Evidence: When exploring theories with mixed evidence, such as the effectiveness of serotonin-enhancing antidepressants being no better than placebos, a stricter 1% level could be employed to rigorously test these claims and provide clearer conclusions against the backdrop of debate.

  • Social Sensitivity: Studies on sensitive topics, like the distinction between biological sex and gender identity, may adopt a 1% significance level to ensure the results are robust and can withstand societal scrutiny or backlash, recognizing the potential for widespread impact.

  • Controversial Nature of Research: Research proposing that criminal behaviour is determined (thus questioning free will) or that ADHD is primarily caused by early environmental factors might also opt for a 1% level to ensure findings are solid enough to challenge established views or provoke thoughtful discussion.

  • Implications for Well-being (Safety): In the case of developing new drugs or vaccines, such as those for COVID-19, a 1% significance level is often chosen to minimize the risk of harm and legal repercussions, reflecting the high stakes of accurately determining efficacy and safety.

  • Minimizing Risk of Errors: The general preference for a 5% significance level in many studies strikes a balance, aiming to reduce the likelihood of mistakenly seeing an effect that isn't there (Type I error) or missing a real effect (Type II error).

  • Directional (1-tailed) Hypotheses: If a study hypothesizes that a specific intervention will improve (not decrease) test scores, reflecting a specific, predicted direction of effect, a 10% significance level might be considered sufficient to explore this targeted hypothesis.

  • Setting the Chance Bar Appropriately: Choosing a 1% significance level for examining the link between cannabis use and psychosis might be too stringent, potentially leading to underreporting of the effect and suggesting a lack of association that simplifies the nuanced reality of risk.

DEDUCTIVE REASONING

In many aspects of life, we often rely on our past experiences to set standards for evidence. For instance, if someone has bitten us before, we may avoid them. This process is known as inductive reasoning and was the primary approach to research before Karl Popper introduced deductive reasoning.

However, scientists operate differently, using deductive reasoning where evidence must be prospective and established before the actual study occurs. This means researchers need to predict the level of certainty expected for the evidence they seek. Consequently, the significance level is predetermined by the researcher.

But like any bet, making predictions carries the risk of failing - or, more specifically, “not being able to reject your null hypothesis; remember, the result of a hypothesis test depends on whether the null hypothesis is rejected.

FOR MORE INFORMATION ON DEDUCTIVE REASONING AND FALSIFICATION PLEASE PRESS ON THIS LINK

LASTLY REMEMBER THE FOLLOWING STATEMENTS ARE ARE ALL EQUIVALENT TO EACH OTHER.

I have only used the 5% level as an example, but any alpha level can be used.

  • The finding is significant at the 0.05 level.

  • The confidence level is 95%

  • The Type I error rate is 0.05.

  • The alpha level is 0.05.

  • α = 0.05.

  • There is a 1 in 20 chance of obtaining this result (or one more extreme).

  • The area of the region of rejection is 0.05.

  • The p‐value is 0.05.

  • p = 0.05.

POSSIBLE EXAM QUESTIONS

  1. What level of significance is accepted as standard in psychological research? (1 mark)

  2. Define a type I error. (2 marks)

    Exam hint: Full marks can be achieved for this question by stating that the null is rejected and the experimental hypothesis accepted when, in fact, results are due to chance and are most likely to happen when the level of significance has been set too leniently.

  3. A psychologist found that their results were significant at p<0.05. What does ‘the results were significant at p<0.05’ mean? (2 marks)

  4. Explain the difference between a type 1 and a type 2 error. (4 marks)

  5. Explain the difference between a calculated value and a critical value. (3 marks)

  6. Explain what is meant by the phrase “not statistically significant at the 10% level.” 2 marks

  7. How does the set significance level affect the chance of researchers getting a type 1 or type 2 error? 4 marks

  8. Explain what is meant by “p=≤0.05”. 2 marks

ANSWERS

  1. The standard significance level accepted in psychological research is typically 0.05 or 5%.

  2. A type I error occurs when the null hypothesis is incorrectly rejected and the experimental hypothesis is accepted, even though the results are due to chance. This error is more likely to happen when the significance level is too lenient.

  3. When a psychologist's results are significant at p<0.05, there is less than a 5% probability that the observed results occurred by chance alone.

  4. Type I error (false positive) occurs when the null hypothesis is wrongly rejected. In contrast, Type II error (false negative) occurs when the null hypothesis is incorrectly accepted when it should have been rejected based on the data.

  5. A calculated value is the outcome of a statistical test based on the sample data. In contrast, a critical value is a threshold value obtained from statistical tables or formulas used to determine whether to reject the null hypothesis.6. If something is statistically significant, it means it did not occur by chance. Your study worked, so you can accept your experimental/alternative hypothesis and reject your Null hypothesis. If it is not significant at the 10% level, your results are more than ten per cent due to chance.

  6. If psychologists choose a ten per cent level of significance, then they have a greater chance of making type one errors. This is because there is a ten per cent probability that results could be due to chance. Therefore, psychologists may accept their experimental hypothesis when they should reject it (they won’t know this until they replicate it). If psychologists go to the other extreme and choose a one per cent significance level, they will have a greater chance of making type two errors. This is when the significance level is set too high, and psychologists reject their experimental hypothesis instead of accepting it (again, not known until replication). Therefore, psychologists usually choose five per cent, as it is midway between making type 1 and type 2 errors.

  7. P=≤0.05 means that the research set her significance level at 5%. This means that if null is rejected, the researcher can only be 95% certain her results did not occur by chance.

TESTS OF DIFFERENCE, CORRELATION AND ASSOCIATION?

THE NEXT STEP IN DECIDING WHICH INFERENTIAL TEST TO USE IS WHETHER YOU NEED A TEST OF DIFFERENCE, ASSOCIATION OR CORRELATION.

TESTS OF DIFFERENCE

Tests of difference are for all experiments:

  • Laboratory

  • Field

  • Quasi

  • Natural

Tests of difference are for research that tests a difference between conditions (IVs) or participants.

.

Tests of difference apply to various types of experiments, including laboratory, field, quasi, and natural experiments. These tests are designed for research to detect disparities between conditions (independent variables or IVs) or participants.

To determine if a test of difference is necessary, consider the following questions:

  1. Are participants engaged in distinct activities or conditions? For instance, are there at least two conditions (IVs) to which participants are randomly allocated, as seen in laboratory and field experiments? Does the research aim to discern disparities in outcomes across these conditions or groups?

  2. Are two groups of unrelated participants engaged in one activity or condition? For example, different sets of participants (IVs) experience one condition.

Examples of Quasi tests of difference include:

  • The difference in conformity scores between religious and non-religious participants.

  • The difference in IQ scores between males and females.

TESTS OF ASSOCIATION

  • Tests of the association are needed for non-experimental research, e.g., observations, content analysis, thematic analysis, interviews, questionnaire surveys, and case studies, but only if the data is nominal.

  • Association tests are needed when researchers are looking for an association between variables.

  • Tests of association are never used for correlations or experiments.

  • Tests of association are needed when researchers are simply counting the frequency at which a behaviour occurs, e.g., tallies about a discreet set of variables. For example, if you wanted to know whether boys or girls play on main roads most frequently.

EXAMPLE ONE: Naturalistic observation of children playing outside to determine who plays on main roads most frequently, boys or girls

  • Nominal data: gaps between the behavioural categories are non-mathematical and cannot be ordered.

  • Test of association: Looking for an association between variables

  • Counting the frequencies of certain behaviours

EXAMPLE TWO: CONTENT ANALYSIS REASONS FOR SMOKING INITIATION IN TEENS.

  • Nominal data: gaps between the behavioural categories are non-mathematical and cannot be ordered.

  • Test of association: Looking for an association between variables

  • Counting the frequencies of certain behaviours

OTHER EXAMPLES OF TEST OF ASSOCIATION CONTINGENCY TABLES WITH TALLIES

EXAMPLE THREE: NATURALISTIC OBSERVATION OF A CHILD’S BEHAVIOUR IN CLASSES.

  • Nominal data: gaps between the behavioural categories are non-mathematical and cannot be ordered.

  • Test of association: Looking for an association between variables

  • Counting the frequencies of certain behaviours

TESTS OF CORRELATION

  • When conducting non-experimental research to examine the relationship or link between two variables, such as the correlation between temperature and ice lolly purchases, correlations are the appropriate statistical tool.

    Correlations exclusively assess the relationship between two variables, with one variable plotted on the X-axis and the other on the Y-axis, resulting in 1x1 designs.

    Correlations require continuous data; they cannot be applied to categorical data.

    In many cases, correlations involve data collected from the same individuals, leading to inferential tests such as Spearman’s rank and Pearson’s product, where a single N represents two sets of variables derived from one individual. When it is non-experimental, and you are testing a relationship/link between two variables, for example, the hotter you are, the more ice lollies you buy.

HOW TO TELL IF YOU NEED A CORRELATION

WHICH OF THE HYPOTHESES BELOW IS A CORRELATION?

  • Male participants aged 15-24 will smoke less than male participants aged 25- 34.

  • Older men smoke more than younger men.

WHAT KIND OF RESEARCH METHOD DOES EACH STUDY NEED?

The first hypothesis, "Male participants aged 15-24 will smoke less than male participants aged 25-34", suggests a comparison between different age groups rather than a direct relationship between age and smoking behaviour. This scenario is better suited for a quasi-experimental design rather than a correlation analysis.

The second hypothesis, "Older men smoke more than younger men", implies a correlation analysis where age is treated as a continuous variable.

Identify the research method:

  • Since both hypotheses involve age, determining the type of study they need can be unclear. When age is treated as a discrete variable, parameters are defined, and age groups are treated as categories, making quasi-experimental designs more suitable. Conversely, correlations are more appropriate when age is treated as a continuous variable since age is viewed as a continuum. This differentiation helps select the appropriate research design based on how age is conceptualised in the study..

Also consider the number of variables:

  • Quasi-experiments typically involve three variables: the independent variable (e.g.,age groups such as those aged 15-24 and those aged 25-34), the dependent variable (e.g., smoking behaviour), and potentially a third variable. The age groups are treated as the independent variable, and smoking behaviour is the dependent variable. Therefore, this aligns with a quasi-experimental design.

    On the other hand, correlation designs typically involve examining the relationship between two continuous variables. When age is defined as a continuous variable (e.g., with no discrete categories such as age 15-24 or 25-34), it fits the criteria for a correlation analysis. As a good rule of thumb, correlations involve two variables, while quasi-experimental designs involve a minimum of three.

Beware of intraclass correlations:

  • An intraclass correlation (ICC) is a statistical method to assess the degree of agreement or similarity among observations made on the same subjects, groups, or clusters. It measures the consistency or reliability of measurements made within the same group.

    For example, consider a study comparing the IQ scores of older and younger siblings within the same families. In this scenario, each family represents a cluster, and the IQ scores of siblings within each family are compared. The ICC would assess the extent to which IQ scores within the same family are similar or correlated.

    The confusion between ICC and quasi-experimental designs arises because both involve comparing groups of individuals. However, the key distinction lies like the relationship between participants.

    In quasi-experimental designs, participants are typically unrelated individuals assigned to different groups or conditions based on pre-existing characteristics or criteria. For example, comparing the IQ scores of individuals from different age groups or educational backgrounds would constitute a quasi-experimental design.

    On the other hand, in ICC studies, participants are related or clustered in some way, such as family members, students within the same classroom, or patients within the same healthcare facility. The focus is on examining the agreement or similarity of measurements within these related groups.

Age can be a continuous variable or a discreet variable. If age is a continuous variable, then it will be a correlation. For example, “The older you are, the more you smoke?” In correlations, age must be displayed continually on either the X or Y axis.

Correlations only measure two variables, one on the X axis and one on the Y axis so essential they are 1x1 designs

But if age is expressed as discreet variable (in categories as nominal data) then it will be a quasi-design, for example, “Male participants aged between 15- 24 will smoke less than male participants aged 25- 34.

Lastly, unless it is an intraclass correlation, both sets of data will come from the same person so if it comes from two sets of people chances are it is a quasi-experiment. AQA for example does not usually give intra-class correlations as examples in questions.

If you cannot reduce the variables to two variables, then it is probably a quasi-experiment. Quasi designs have three variables

INTRACLASS CORRELATIONS AND CONCORDANCE RATES.

Intraclass correlations can sometimes be mistaken for quasi-experiments. Students often get confused if research compares two groups of people.

EXAMPLES OF INTRA-CLASS-CORRELATIONS

WHAT INFERENTIAL TEST SHOULD YOU CHOOSE?

You should now know how to work out the following 5 things about your data and research to select the right inferential test.

CRITERIA NEEDED TO CHOOSE AN INFERENTIAL TEST:

  1. Do you understand the level of significance, e.g., 001, 0.05, or 0.10 (1%, 5%, 10%)?

  2. Do you understand what level of measurement means (e.g. nominal, ordinal, interval or ratio?

CLICK HERE FOR DETAILS ON HOW TO CHOOSE LEVELS OF MEASUREMENT

3. Do you understand the differences between directional (1-tailed) or non-directional (2-tailed) hypotheses?

4. Do you know how to distinguish between difference, association or correlation tests?

5. If it’s a test of difference, do you know how to choose the appropriate design, e.g., Independent Group, Matched Pairs or Repeated Measures?

CLICK HERE FOR DETAILS ON HOW TO CHOOSE AN EXPERIMENTAL DESIGM .

CHOOSING AN INFERENTIAL TEST

You can now choose an inferential test and determine if the results are statistically significant. It might be helpful to think of inferential statistics as “chance calculating statistics”.

STEP 1:

  1. What is the level of measurement of your data (e.g. nominal, ordinal, interval or ratio)?

  2. Do you need a test of difference, association or correlation?

  3. If you need a test of difference, which design are you using: Independent Groups, Matched Pairs, or Repeated Measures?

  4. If your data is parametric or not?

  5. Choose an inferential test :

OBSERVED CALCULATED VALUES:

STEP 2:

Conduct your inferential test.

Get the results of your inferential test.

It's crucial to recognise that, in statistics, the outcome of an inferential test is not merely called "the result." Instead, statisticians use specific terms for the result: the "observed value" or the "calculated value." The rationale for the term "calculated value" is straightforward—it signifies that the result is obtained through a calculation. On the other hand, the term "observed value" might not seem as mnemonic-friendly, lacking a direct associative memory aid. To bridge this gap, I found it helpful to link these terms, adopting the mnemonic "I need to observe my calculated value" to emphasise that they essentially refer to the same concept, e.g., the actual result of your inferential test.

The observed or calculated value is written with a statistical test symbol before the observed /calculated value (result) so researchers know what test has been used.

For example:

  • Sign Test = S

  • Mann-Whitney U is = U

  • Unrelated T Test = t

  • Wilcoxon Matched pairs = T

  • Related T Test = t

  • Spearman’s Rho = rho

  • Pearson’s Product is = r

  • Chi-Square = x2

The statistical test symbol and observed/calculated value would look something like this: U = 80

HOW DO YOU GET AN OBSERVED/CALCULATED VALUE?

Official definitions:

  • Observed Value – The number produced after a statistical test's various steps and calculations have been carried out.

  • Critical values are cut-off values that define regions where the test statistic is unlikely to lie.

To obtain an observed/calculated value in statistical analysis, you perform the specific inferential statistical test relevant to your research question. This involves collecting data, applying the chosen test's formula, and calculating the value based on your data. This calculated value is the "observed" or "calculated" result of the test, indicating the outcome of your statistical analysis.

You won't be tasked with calculating an observed value in many exams—and for good reason. The process of computing inferential statistics is complex and often relegated to software programs like Minitab or SPSS. The essence of psychology isn't rooted in mastering statistical equations; rather, it's about interpreting the meanings behind calculations and results.

Understanding the significance of your observed/calculated value is crucial in exams and research. This involves discerning whether your findings are merely coincidental or statistically significant. To do this, you compare your calculated value to a critical value—a benchmark number found in probability tables, which are readily available and pre-calculated for your convenience. But how do you determine which critical value to use? Before we delve into that, let's clarify what critical values entail.

CRITICAL TABLE VALUES

A Critical Value Table/Probability Table

In hypothesis testing, a probability table acts as a critical tool, offering us a way to gauge whether the outcomes of our statistical tests—the calculated values—happen merely by chance. Embedded within this table are "critical values," pre-determined numbers that function as key markers or thresholds.

Calculated Value: This is the outcome of your statistical analysis, the numeric result derived from applying your test to your collected data.

Critical Value in a Probability Table: To assess the significance of your calculated value, you reference a critical value that aligns with your research hypothesis and statistical test. This critical value acts as a benchmark, determining the boundary between statistically significant results (implying a low likelihood of occurring by chance) and those that are not. It's like a marker in the sand that says, "Cross this line, and your findings are too unusual to be just chance."

The Comparison: When your calculated value exceeds the critical value threshold, it signals that the phenomenon under investigation is unlikely to have occurred by chance. This allows you to confidently reject the null hypothesis, which assumes no real effect or difference. The moment your data surpasses this boundary, it implies that the effect observed in your study is significant enough to be acknowledged as a real occurrence rather than a result of random chance.

Essentially, the critical value from the probability table is a tool for assessing how exceptional your calculated value is. If your results cross this predefined limit, it's a strong indicator that what you've found holds statistical significance, reinforcing the notion that the effect or difference you're exploring exists in the population under study.

To choose the right probability table and critical table value, you need to complete steps 3 and 4.

STEP 3

Note: All tests need STEP 1 & STEP 2 above.

Critical values are found in tables that differ according to the nature of your study, whether your hypothesis is one-tailed or two-tailed, the significance level you're using, and how many participants are in your study. To pinpoint the precise table and critical value applicable to your research, it's essential first to ascertain these particular elements of your study. Below are the options you need to consider:

  1. Study Focus: Define whether your research is exploratory, confirming an existing theory, or testing a new intervention.

  2. Hypothesis Direction: Decide if your hypothesis is one-tailed (predicting a specific direction of effect) or two-tailed (predicting an effect without specifying the direction).

  3. Significance Level: Which significance level did you choose? Significance levels are commonly set at 0.05. The significance level also determines the type of critical value you choose.

  4. Participant Count: Know the total number of participants, as different tests may be required for different sample sizes. In critical value tables, “N” stands for number of participants.

You can only choose one option from the following three choices:

  • When the table indicates a single "N", the study utilizes a single group of participants across all conditions, as seen in repeated measures designs and correlational studies. Notably, in matched pairs designs, participants are paired so that for analytical purposes, they're considered to represent one collective group. The single "N" therefore signifies that the data is derived from related groups, which are treated as a singular entity for the analysis. This principle also applies to interclass correlational designs, where data is typically collected from one coherent set of participants. Inferential tests designed for this type of data include correlation tests like Spearman’s rho and Pearson’s product-moment correlation for assessing relationships, and tests such as the Wilcoxon matched pairs test, related T-tests, and Sign tests for analyzing repeated measures and matched pairs designs. The selection of a specific test is guided by the data's level of measurement, ensuring the most appropriate analytical approach is applied.

  • When a table displays "N1 & N2," it signifies that the participant groups in the experimental setup are independent of each other, as seen in independent group designs or quasi-experimental designs where the groups themselves are the independent variables (IVs). In this context, "N1" represents one group, while "N2" denotes the other group, highlighting their separation and lack of relation. Inferential tests suitable for analyzing data between these distinct groups include the Mann-Whitney U test for ordinal or non-normally distributed data and the unrelated T-test for normally distributed data. The selection between these tests is determined by the level of measurement used in the research, ensuring the statistical analysis is appropriately matched to the nature of the data.

  • Degrees of freedom (df) are applied in statistical tests involving nominal data, typically outside experimental designs. Unlike in experiments and correlational studies where df might relate to participant numbers, in analyses of nominal data, degrees of freedom reflect the number of independent variables or categories under examination. For instance, in observational studies or content analyses where categories are pivotal, df helps determine the test's capacity to estimate variability within the data accurately.. A chi-square is used with df. Lastly, degrees of freedom are worked out with the following sum: The number of rows minus one multiplied by the number of columns minus 1.

  • In simpler terms, degrees of freedom (df) is a statistical concept used to understand the flexibility or constraints in our data when we're analysing certain types of information, like categories or groups.

    Imagine you're working with data with different categories, like colours of cars or types of fruits. Degrees of freedom tell us how many categories we can freely choose without being restricted by other factors.

    For example, let's say we're looking at the colours of cars on the street, and we have three colours: red, blue, and green. If we know the total number of cars but only two of the colours, we can figure out the third colour because the total number of cars constrains it. That's degrees of freedom at work – it tells us the amount of wiggle room we have to make choices within our data.

    In practical terms, degrees of freedom help statisticians determine the right statistical tests to use and how reliable the results are. They're like guardrails that keep our analysis on track, ensuring that we're making accurate conclusions based on the data we have

QUESTIONS:

What is meant by the term critical value?

  1. How are degrees of freedom calculated?

  2. For each test, find out how (df) or N or N1 and N2 are needed

  • Sign Test

  • Mann-Whitney

  • Unrelated T Test

  • Wilcoxon Matched pairs

  • Related T Test

  • Spearman’s Rho

  • Pearson’s Product

  • Chi-Square

STEP 4

Is your data parametric or non-parametric?

In parametric data, the pattern typically follows a specific mathematical distribution, such as a normal (bell-shaped) distribution. This means the data points are evenly spread around a central value, forming a symmetrical curve when plotted on a graph. Parametric data has certain properties, like a mean (average) and standard deviation, which are useful for statistical analysis.

On the other hand, non-parametric data doesn't follow a particular mathematical distribution. It may have irregular patterns, outliers, or skewed distributions that don't fit neatly into standard statistical models. Non-parametric data is often analyzed using methods that don't rely on specific distributional assumptions, making them more flexible for different data types.

To determine if data is parametric, consider the following criteria:

  • Nature of the Data: Parametric data typically consists of continuous variables (e.g., height, weight, temperature) that can be measured on a scale. Categorical or ordinal data (e.g., gender, Likert scale responses) are usually non-parametric.

  • Measurement Scale: Parametric tests are suitable for interval or ratio scale data, where the intervals between values are equal and meaningful. Non-parametric tests are more appropriate for nominal or ordinal scale data, where the values represent categories or rankings without consistent intervals.

  • Distribution: Parametric data often follows a normal distribution, meaning the data points are symmetrically distributed around the mean. To assess if the descriptive data is bell-shaped, visually inspect their data using histograms or quantile-quantile plots (QQ plots) to see if it resembles a bell-shaped curve.

  • Sample Size: Parametric tests are more robust with larger sample sizes. If the sample size is small (usually less than 30), opt for non-parametric tests to avoid parametric assumptions about the population distribution.

  • Homogeneity of Variance: Parametric tests assume that the variance (spread) of the data is consistent across groups or conditions—statistical tests like Levene's test to check for homogeneity of variance. If the variances are significantly different, it may indicate non-parametric data."Homogeneity of variance" refers to the assumption that the variability, or spread, of scores within each group or condition being compared is approximately equal across all groups. In simpler terms, it means that the data's variation is similar across different groups or conditions.

    For example, if you compare test scores between two different teaching methods (Group A and Group B), homogeneity of variance suggests that the spread of scores in Group A is roughly the same as in Group B.

    Homogeneity of variance is an important assumption for many parametric statistical tests, such as the t-test and analysis of variance (ANOVA). Violations of this assumption can lead to inaccurate results and affect the validity of the statistical analysis.

    Researchers typically test for homogeneity of variance using statistical tests like Levene's test. Suppose the test indicates that the variances are significantly different between groups. In that case, it suggests that the assumption of homogeneity of variance has been violated, and alternative statistical approaches may be necessary.

FINALLY!!!

Once all the correct information is applied, researchers can compare their observed value with a critical value.

  • In certain tests, such as the Mann-Whitney U and Wilcoxon matched pairs, the observed value must be smaller than the critical value to achieve statistical significance, indicating that the results are not due to chance.

  • Conversely, in other inferential tests like Spearman’s Rho and Chi-Square, the observed value needs to exceed the critical value to be considered statistically significant, demonstrating that the findings are not random.

  • As a general guideline, experiments typically require the calculated value to be below the critical value, while non-experiments necessitate the calculated value to surpass the critical value. However, these rules are typically provided in critical table values, alleviating the need to memorise them. But see below for the different rules.

INFERENTIAL TESTS FOR TEST OF DIFFERENCE

  • Sign test: The Observed/calculated value is ‘S’. If the observed/calculated value of ‘U’ is equal to or less than the critical/table value, you can reject your null hypothesis and accept your experimental hypothesis. Your result is significant for independent group designs with nominal data.

  • Wilcoxon Matched Pairs Signed rank test: The observed/calculated value is T. If the observed/calculated value of ‘T’ is equal to or less than the critical/table value, you can reject your null hypothesis and accept your experimental hypothesis. Your result is significant for repeated measures and matched pairs group designs with ordinal data.

  • Man Whitney U test: The Observed/calculated Value is ‘U’. If the observed/calculated value of ‘U’ is equal to or less than the critical/table value, you can reject your null hypothesis and accept your experimental hypothesis. Your result is significant for independent group designs with ordinal data.

  • Related T-test: The observed/calculated value is t. If the observed/calculated value of ‘t’ is equal to or less than the critical/table value, you can reject your null hypothesis and accept your experimental hypothesis. Your result is significant for independent group designs with interval data.

  •  Unrelated T-test: The observed/calculated value is t. If the observed/calculated value of ‘t’ is equal to or less than the critical/table value, you can reject your null hypothesis and accept your experimental hypothesis. Your result is significant for repeated measures and matched pairs group designs with interval data.

INFERENTIAL TESTS FOR CORRELATIONS

  • Spearman’s Rho: The observed/calculated value is rho. If the observed/calculated value of ‘rho’ is equal to or more than the critical/table value, you can reject your null hypothesis and accept your experimental hypothesis. Your result is significant. For links between variables when data is ordinal.

  • Pearson’s product: The observed/calculated value is r. If the observed/calculated value of ‘r’ equals or exceeds the critical/table value, you can reject your null hypothesis and accept your experimental hypothesis. Your result is significant. For links between variables when data is the interval

INFERENTIAL TESTS FOR TESTS OF ASSOCIATION

  • Chi-square: The observed/calculated value is X2. If the observed/calculated value of ‘X2’ is equal to or more than the critical/table value, then you can reject your null hypothesis and accept your experimental hypothesis. Your result is significant for frequencies or categories when data is nominal.

LET’S LOOK AT AN EXAMPLE EXAMINATION QUESTION

“A psychologist was interested in the effects of a restricted diet on memory functioning, and he expected memory to become impaired. The psychologist hypothesised that participants’ scores on a memory test are lower after a restricted diet than before a restricted diet. He gave the volunteers a memory test when they first arrived in the research unit and a similar test at the end of the four weeks. He recorded the memory scores on both tests and analysed them using the Wilcoxon signed ranks test.” The test was out of 100

The psychologist set the significance level at 5%.

  • The calculated value was T = 53.

  • N= 20

CRITICAL TABLE VALUES FOR THE MEMORY FUNCTION AND DIET STUDY

Q1: State whether the hypothesis for this study is directional or non-directional. (1 mark)

Q2: Using Table 1, state whether the psychologist’s result was significant. (3 Mark)

Q3: Explain your answer. (2 marks).

Q4: Name a statistical test appropriate for this investigation and give three reasons why it was appropriate to use this statistical test (4 Marks).

ANSWERS

Q1.Directional/one-tailed: as the psychologist specified, the results should follow a direction/tail, e.g., participants in the non-restricted food condition should get better scores on a memory test. One mark for correct answer – directional (one-tailed is acceptable)

Q2. Yes, the psychologist’s result was significant. One mark for correctly stating that the result is significant.(1 mark)

Q3: The critical value of T for N =20 for a one-tailed test where p ≤=0.05 is 60. As the observed/calculated value of T (53) is less than the critical/table value, the likelihood of my results occurring by chance is less than 5% (p ≤ 0.05). Therefore, I can reject my Null hypothesis and accept my experimental hypothesis. (Two marks)

  • Two further marks for an explanation: the calculated value of T =53 is less than the critical value of 60 where N = 20 and p ≤ 0.05 for a one-tailed test.

  • If the candidate states that the result is insignificant, no marks can be awarded.

Q4. A Wilcoxon matched pairs test was chosen because a test of difference was needed, the experimental design was repeated measures, and the level of data was interval as the differences between memory are mathematical.

To score points on the questions above, it's essential to cover four to five key elements as specified:

  • The name of the test

  • The type of test, such as difference, association, or correlation

  • The experimental design, if applicable

  • The level of measurement

  • The rationale for the chosen level of measurement

CRITICAL TABLE VALUES

FOR DATA TO BE STATISTICALLY SIGNIFICANT, THE CALCULATED VALUE MUST BE EQUAL TO OR GREATER THAN THE CRITICAL VALUE

QUESTIONS ON THE OBSERVED AND CRITICAL VALUE

Using your set of critical/table values tables, decide whether the alternative/experimental hypothesis should be accepted or rejected and why. For the first one, complete the gaps to understand how to write up your answers.

You can use the critical values tables above

  1. What is meant by the term critical value?

  2. State two things that are always needed in order to find the critical value.

  3. In addition to the two factors above, state the third factor that is needed to find the critical value and explain when each is used:
    (a) Degrees of freedom (df)
    (b) N
    (c) N1 and N2

  4. How are degrees of freedom (df) calculated for a chi-square test?

  5. For each of the following statistical tests, state whether df, N, or N1 and N2 are required:
    •Sign Test
    • Mann–Whitney U test
    • Related t-test
    • Unrelated t-test
    • Spearman’s rho
    • Wilcoxon signed-rank test
    • Chi-square test

  6. Explain how the observed value and critical value are used together to decide whether results are statistically significant.

  7. Using critical value tables, decide whether the following observed values are statistically significant at p = 0.05:

a) Rho = 0.410 for a one-tailed test where N = 20
b) Rho = 0.50 for a two-tailed test where N = 10
c) χ² = 3.24 for a two-tailed test with a 2 × 2 contingency table
d) χ² = 5.00 for a one-tailed test with a 3 × 2 contingency table
e) U = 16 for a one-tailed test with 9 participants in one group and 8 in the other
f) U = 76 for a two-tailed test with 30 participants split equally between two conditions
g) T = 54 for a one-tailed test with 25 participants
h) T = 105 for a two-tailed test with 20 participants

MORE PRACTICE QUESTIONS (QUESTIONS ONLY)

Using your set of critical/table values, decide whether the alternative/experimental hypothesis should be accepted or rejected and explain why.

Use the following structure:

At the ______ level of significance, the critical/table value for a ______ tailed test, when ______ = ______ is ______.
Since the observed value of ______ is ______ which is ______ than the critical value, the ______ hypothesis can be ______ and the ______ hypothesis can be ______.

8. Wilcoxon Matched Pairs test, N = 12, directional hypothesis at p ≤ 0.05, T = 27

9. Mann–Whitney U test, non-directional hypothesis at p ≤ 0.10, N1 = 17 and N2 = 15, U = 54

10. Chi-square test, non-directional hypothesis at p ≤ 0.05, df = 10, χ² = 22.42

11. Spearman’s rho, non-directional hypothesis at p ≤ 0.10, N = 25, r = 0.511

12. Mann–Whitney U test, directional hypothesis at p ≤ 0.05, N1 = 16 and N2 = 19, U = 97

13. Chi-square test, non-directional hypothesis at p ≤ 0.10, df = 36, χ² = 50.00

14. Spearman’s rho, directional hypothesis at p ≤ 0.05, N = 11, r = 0.421

15. Mann–Whitney U test, directional hypothesis at p ≤ 0.05, N1 = 20 and N2 = 20, U = 136

16. Chi-square test, non-directional hypothesis at p ≤ 0.10, df = 27, χ² = 45.78

17. Spearman’s rho, directional hypothesis at p ≤ 0.05, N = 19, r = 0.39

18. Mann–Whitney U test, non-directional hypothesis at p ≤ 0.10, N1 = 12 and N2 = 13, U = 39

19. Chi-square test, non-directional hypothesis at p ≤ 0.05, df = 14, χ² = 18.17

20. Spearman’s rho, directional hypothesis at p ≤ 0.05, N = 11, r = 0.421

21. Unrelated t-test, non-directional hypothesis at p ≤ 0.05, df = 18, t = 1.92

22. Unrelated t-test, directional hypothesis at p ≤ 0.05, df = 22, t = 1.65

23. Unrelated t-test, non-directional hypothesis at p ≤ 0.10, df = 28, t = 1.55

24. Related t-test, directional hypothesis at p ≤ 0.05, df = 11, t = 2.05

25. Related t-test, non-directional hypothesis at p ≤ 0.05, df = 15, t = 1.80

26. Related t-test, directional hypothesis at p ≤ 0.10, df = 19, t = 1.42

27. Pearson’s product-moment correlation, non-directional hypothesis at p ≤ 0.05, df = 10, r = 0.61

28. Pearson’s product-moment correlation, directional hypothesis at p ≤ 0.05, df = 18, r = 0.42

29. Pearson’s product-moment correlation, non-directional hypothesis at p ≤ 0.10, df = 28, r = 0.31

30. Pearson’s product-moment correlation, directional hypothesis at p ≤ 0.01, df = 12, r = 0.70

RESULTS SECTION M: INFERENTIAL STATISTICS

OBSERVED AND CRITICAL VALUES

  1. What is meant by the term critical value?
    The critical value is the cut-off point that the observed value must reach or exceed (or be smaller than, depending on the test) in order for the null hypothesis to be rejected.

  2. State two things that are always needed to find the critical value.
    • The significance level (e.g., 0.10, 0.05, 0.01)
    • Whether the hypothesis is one-tailed or two-tailed

  3. State the third factor needed to find the critical value and explain when each is used.
    (a) Degrees of freedom (df) – used for chi-square tests.
    (b) N – used for Spearman’s rho correlations and Wilcoxon signed-rank tests (repeated measures or matched pairs).
    (c) N1 and N2 – used for Mann–Whitney U tests with two independent groups.

  4. How are degrees of freedom calculated for a chi-square test?
    (df) = (number of rows − 1) × (number of columns − 1)

  5. Which value is required for each statistical test?
    •Sign test = N
    • Mann–Whitney U test = N1 and N2
    • Wilcoxon signed-rank test = N
    • Related t-test = df (df = N − 1)
    • Unrelated t-test = df (df = N1 + N2 − 2)
    • Spearman’s rho = N
    • Pearson’s product-moment correlation = df (df = N − 2)
    • Chi-square test = df = (number of rows − 1) × (number of columns −

  6. How are the observed value and the critical value used together?
    The observed value calculated from the data is compared to the critical value from the statistical table. If the observed value meets the criteria for significance (greater than or smaller than the critical value, depending on the test), the null hypothesis is rejected. If not, the null hypothesis is retained.

  7. Decide whether each observed value is statistically significant at p = 0.05.

a) Rho = 0.410, one-tailed, N = 20
Significant

b) Rho = 0.50, two-tailed, N = 10
Not significant

c) χ² = 3.24, two-tailed, 2 × 2 table
Not significant

d) χ² = 5.00, one-tailed, 3 × 2 table
Significant

e) U = 16, one-tailed, N1 = 9, N2 = 8
Significant

f) U = 76, two-tailed, N = 30 split equally
Not significant

g) T = 54, one-tailed, N = 25
Significant

h) T = 105, two-tailed, N = 20
Not significant

9. Wilcoxon Matched Pairs, N = 12, directional, p ≤ 0.05, T = 27

At the 5% level of significance, the critical value for a one-tailed test when N = 12 is 17.
Since the observed value of T is 27, which is greater than the critical value, the experimental hypothesis is rejected, and the null hypothesis is accepted.

10. Mann–Whitney U, non-directional, p ≤ 0.10, N1 = 17, N2 = 15, U = 54

At the 10% level of significance, the critical value for a two-tailed test when N1 = 17 and N2 = 15 is 83.
Since the observed value of U is 54, which is less than the critical value, the experimental hypothesis is accepted, and the null hypothesis is rejected.

11. Chi-square, non-directional, p ≤ 0.05, df = 10, χ² = 22.42

At the 5% significance level, the critical value for df = 10 is 18.31.
Since the observed value of χ² is 22.42, which is greater than the critical value, the alternative hypothesis is accepted, and the null hypothesis is rejected.

12. Spearman’s rho, non-directional, p ≤ 0.10, N = 25, r = 0.511

At the 10% level of significance, the critical value for a two-tailed test when N = 25 is 0.337.
Since the observed value of r is 0.511, which is greater than the critical value, the alternative hypothesis is accepted, and the null hypothesis is rejected.

13. Mann–Whitney U, directional, p ≤ 0.05, N1 = 16, N2 = 19, U = 97

At the 5% level of significance, the critical value for a one-tailed test is 101.
Since the observed value of U is 97, which is less than the critical value, the experimental hypothesis is accepte,d and the null hypothesis is rejected.

14. Chi-square, non-directional, p ≤ 0.10, df = 36, χ² = 50.00

At the 10% significance level, the critical value for df = 36 is 49.80.
Since the observed value of χ² is 50.00, which is greater than the critical value, the alternative hypothesis is accepted, and the null hypothesis is rejected.

15. Spearman’s rho, directional, p ≤ 0.05, N = 11, r = 0.421

At the 5% level of significance, the critical value for a one-tailed test when N = 11 is 0.536.
Since the observed value of r is 0.421, which is less than the critical value, the alternative hypothesis is rejected, and the null hypothesis is accepted.

16. Mann–Whitney U, directional, p ≤ 0.05, N1 = 20, N2 = 20, U = 136

At the 5% level of significance, the critical value for a one-tailed test is 138.
Since the observed value of U is 136, which is less than the critical value, the experimental hypothesis is accepted, and the null hypothesis is rejected.

17. Chi-square, non-directional, p ≤ 0.10, df = 27, χ² = 45.78

At the 10% significance level, the critical value for df = 27 is 36.74.
Since the observed value of χ² is 45.78, which is greater than the critical value, the alternative hypothesis is accepted, and the null hypothesis is rejected.

18. Spearman’s rho, directional, p ≤ 0.05, N = 19, r = 0.39

At the 5% level of significance, the critical value for a one-tailed test when N = 19 is 0.391.
Since the observed value of r is 0.39, which is less than the critical value, the alternative hypothesis is rejected, and the null hypothesis is accepted.

19. Mann–Whitney U, non-directional, p ≤ 0.10, N1 = 12, N2 = 13, U = 39

At the 10% level of significance, the critical value for a two-tailed test is 47.
Since the observed value of U is 39, which is less than the critical value, the experimental hypothesis is accepted, and the null hypothesis is rejected.

20. Chi-square, non-directional, p ≤ 0.05, df = 14, χ² = 18.17

At the 5% significance level, the critical value for df = 14 is 23.68.
Since the observed value of χ² is 18.17, which is less than the critical value, the alternative hypothesis is rejected, and the null hypothesis is accepted.

21. Spearman’s rho, directional, p ≤ 0.05, N = 11, r = 0.421

At the 5% level of significance, the critical value for a one-tailed Spearman’s rho test, when N = 11, is 0.536.
Since the observed value of r = 0.421, which is less than the critical value, the alternative hypothesis is rejected, and the null hypothesis is accepted.

22. Unrelated t-test, directional hypothesis at p ≤ 0.05, df = 22, t = 1.65
At the 5% level of significance, the critical value for a one-tailed test when df = 22 is 1.72. Since the observed value of t is 1.65, which is less than the critical value, the alternative hypothesis is rejected, and the null hypothesis is accepted.

23. Unrelated t-test, non-directional hypothesis at p ≤ 0.10, df = 28, t = 1.55
At the 10% level of significance, the critical value for a two-tailed test when df = 28 is 1.70. Since the observed value of t is 1.55, which is less than the critical value, the alternative hypothesis is rejected, and the null hypothesis is accepted.

24. Related t-test, directional hypothesis at p ≤ 0.05, df = 11, t = 2.05
At the 5% level of significance, the critical value for a one-tailed test when df = 11 is 1.80. Since the observed value of t is 2.05, which is greater than the critical value, the alternative hypothesis is accepted, and the null hypothesis is rejected.

25. Related t-test, non-directional hypothesis at p ≤ 0.05, df = 15, t = 1.80
At the 5% level of significance, the critical value for a two-tailed test when df = 15 is 2.13. Since the observed value of t is 1.80, which is less than the critical value, the alternative hypothesis is rejected, and the null hypothesis is accepted.

26. Related t-test, directional hypothesis at p ≤ 0.10, df = 19, t = 1.42
At the 10% level of significance, the critical value for a one-tailed test when df = 19 is 1.33. Since the observed value of t is 1.42, which is greater than the critical value, the alternative hypothesis is accepted, and the null hypothesis is rejected.

27. Pearson’s product-moment correlation, non-directional hypothesis at p ≤ 0.05, df = 10, r = 0.61
At the 5% level of significance, the critical value for a two-tailed test when df = 10 is 0.576. Since the observed value of r is 0.61, which is greater than the critical value, the alternative hypothesis is accepted, and the null hypothesis is rejected.

28. Pearson’s product-moment correlation, directional hypothesis at p ≤ 0.05, df = 18, r = 0.42
At the 5% level of significance, the critical value for a one-tailed test when df = 18 is 0.378. Since the observed value of r is 0.42, which is greater than the critical value, the alternative hypothesis is accepted, and the null hypothesis is rejected.

29. Pearson’s product-moment correlation, non-directional hypothesis at p ≤ 0.10, df = 28, r = 0.31
At the 10% level of significance, the critical value for a two-tailed test when df = 28 is 0.306. Since the observed value of r is 0.31, which is greater than the critical value, the alternative hypothesis is accepted, and the null hypothesis is rejected.

30. Pearson’s product-moment correlation, directional hypothesis at p ≤ 0.01, df = 12, r = 0.70
At the 1% level of significance, the critical value for a one-tailed test when df = 12 is 0.661. Since the observed value of r is 0.70, which is greater than the critical value, the alternative hypothesis is accepted, and the null hypothesis is rejected

EXAMIINER STYLE WRITE UP EXAMPLE

The critical value of T for N = 20 for a one-tailed test where p ≤ 0.05 is 60. As the observed value of T (29.5) is less than the critical value, the results are statistically significant. Therefore, the null hypothesis is rejected, and the experimental hypothesis is accepted

SECTION M: QUESTIONS ON RESEARCH DESIGN

For each study, answer the questions below in order.

1. IS IT AN EXPERIMENT? (YES OR NO)
If NO, go to Question 2.

a) What type of experiment is it: laboratory, field, quasi, or natural
b) What experimental design is used: repeated measures, independent groups, or matched pairs
c) What are the IV and DV
d) What level of measurement are the data
e) What statistical test is required
f) Is the hypothesis directional (one-tailed) or non-directional (two-tailed)
If it is not obvious, suggest a direction
g) Suggest a suitable significance level
h) What value is needed to find the critical value: N, N1 and N2, or df
i) Write an aim
j) Write a hypothesis

2. IS IT A CORRELATION (NON-EXPERIMENT)? (YES OR NO)
If NO, go to Question 3.

a) What are the co-variables
b) Is the hypothesis directional (one-tailed) or non-directional (two-tailed)
If it is not obvious, suggest a direction
If directional, state whether it is positive or negative
c) Suggest a suitable significance level
d) What level of measurement are the data
e) What statistical test is required
f) What value is needed to find the critical value: N, N1 and N2, or df
g) Write an aim
h) Write a hypothesis

3. IS IT A NON-EXPERIMENT? (YES OR NO)
If NO, reconsider your earlier answers.

a) What type of non-experiment is it: natural observation, controlled observation, questionnaire or survey, content analysis, or case study
b) What are the variables
c) What level of measurement are the data
d) Is the hypothesis directional (one-tailed) or non-directional (two-tailed)
If it is not obvious, suggest a direction
e) Suggest a suitable significance level
f) What statistical test is required
g) What value is needed to find the critical value: N, N1 and N2, or df
h) Write an aim
i) Write a hypothesis

RESEARCH SCENARIOS

  1. Researchers want to replicate a study examining differences between male and female estimates of stopping distance.

  2. Ainsworth’s Strange Situation involved observers having behavioural categories to observe and tick each time they observed a behaviour.

  3. Researchers believe that siblings’ aggression levels will have a similar relationship. Siblings are each given a rating scale to measure aggression, for example: “On a scale of 1 to 10, how aggressive are you?”

  4. Researchers analyse “Lonely Heart” advertisements to investigate sexual selection theory. They hypothesise that male participants will advertise “status” more frequently than female participants and that female participants will advertise “looks” more frequently than male participants.

  5. In a company, disabled and able-bodied participants are asked to indicate on a scale of 1 to 7 how much they feel in control of their working environment.

  6. Identical twins are split into condition A, which sets a puzzle they can solve, or condition B, which sets an unsolvable puzzle. After thirty minutes in either condition, their stress level is measured with a rating scale, for example: “On a scale of 1 to 5, how stressed are you?” It is thought that participants who can solve the puzzle will be less stressed.

  7. One group of participants is given an IQ test, then asked to take multivitamins for a month, and their IQ is measured again. It is thought that participants who take multivitamins will have higher IQ scores.

  8. Researchers want to determine whether there is a difference in the happiness ratings of academic and non-academic pupils.

  9. Researchers are investigating the effect of memory on age. They want to see if those aged 20 to 40 have poorer memories than those aged 40 to 60, so they administer a digit span memory test to both groups.

  10. It is hypothesised that listening to music with aggressive lyrics increases heart rate. Participants listen to either music with aggressive lyrics or non-aggressive lyrics whilst their heart rate is measured.

  11. It is hypothesised that caffeine causes memory problems. Scores on a memory test are taken before and after taking caffeine pills.

  12. It is hypothesised that high levels of testosterone increase risk-taking. As finger length ratios are used as an indirect marker of prenatal testosterone exposure, the ratio between the second and fourth digits (2D:4D) of male participants’ fingers is measured and compared with risk-taking scores on a questionnaire.

  13. How many units of alcohol per week are consumed by males and females?

  14. Pictures of married couples are taken. Female participants rate the attractiveness of the male spouse, and male participants rate the attractiveness of the female spouse. It is thought that couples will have similar levels of attractiveness.

  15. Female and male participants are asked to choose which female body shape they prefer, sizes 6, 8, 10, 12, 14, 16, 18, or 20. The sizes are exact.

  16. Participants from Western and non-Western societies are asked to complete the Social Readjustment Rating Scale (SRRS, Holmes and Rahe) and calculate their scores.

  17. School students are observed choosing snacks during breaks. The snack choices are either apples or crisps. The next morning, during school assembly, the same students are given the nutritional value of apples vs. crisps. They are then observed again to see whether they choose apples or crisps at break time.

  18. A researcher wants to see if older siblings are more intelligent than younger siblings. All siblings complete an IQ test.

  19. Children living in homes without gardens and those living in homes with private gardens are observed to see whether they choose to play outside in the street or stay at home.

  20. Participants are either put in a jogging or non-jogging condition or asked to rate pictures of the opposite sex on a scale of 1 to 10. This is new research.

  21. Research suggests that the antioxidants in foods such as blueberries can reduce age-related declines in cognitive functioning. To test this, a researcher selects 25 adults and administers a cognitive function test to each participant. The participants then drink a blueberry supplement daily for four months before they are tested again.

  22. To examine the connection between alcohol consumption and birth weight, a researcher selects a sample of 20 pregnant rats and mixes alcohol with their food for two weeks before the pups are born. Another group of 20 pregnant rats is used as a comparison group.

  23. To examine how texting affects driving skills, a researcher sets up a driving circuit using orange traffic cones in a parking lot. A group of students is then tested on the circuit twice: once while sending and receiving text messages, and once without sending or receiving text messages. The researcher records the number of cones hit on each circuit while driving for each student.

  24. The more people exercise, the lower their blood pressure.

  25. A statistics instructor believes that doing homework improves exam scores. To test this hypothesis, she randomly assigns students to two groups. One group must work on the homework until all problems are correct, while the second group's homework is optional. Exam grades are compared between the two groups at the end of the semester.

  26. Researchers are investigating the effect of memory on age. They want to see if older children have poorer memories than younger children, so they administer a memory test to both groups

ANSWERS SECTION M: QUESTIONS ON RESEARCH DESIGN

1. Male vs female stopping distance estimates
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Quasi-experiment.
What is the experimental design? Independent groups.
What are the IV and DV? IV: gender (male vs female). DV: stopping distance estimate.
What type of hypothesis is it: directional or non-directional? Non-directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Unrelated t-test.
What value is needed to find the critical value? df.
Aim: To investigate whether stopping distance estimates differ between males and females.
Hypothesis: There will be a difference in stopping distance estimates between males and females.

2. Ainsworth Strange Situation behavioural categories
Is the research an experiment or a non-experiment? Non-experiment.
What is the research method? Controlled observation.
What are the variables? Attachment classification and behaviour category frequency.
What type of hypothesis is it: directional or non-directional? Non-directional.
Test of difference, association or correlation? Association.
What inferential test is needed? Chi-square.
What value is needed to find the critical value? df.
Aim: To investigate whether observed attachment-related behaviours differ across attachment classifications.
Hypothesis: There will be an association between attachment classification and observed behaviour categories.

3. Sibling aggression (intra-class correlation)
Is the research an experiment or a non-experiment? Non-experiment.
What is the research method? Intra-class correlation.
What are the co-variables? Sibling A aggression rating and Sibling B aggression rating.
What type of hypothesis is it: directional or non-directional? Non-directional.
Test of difference, association or correlation? Correlation.
What inferential test is needed? Spearman’s rho.
What value is needed to find the critical value? N.
Aim: To investigate whether siblings’ aggression ratings are related.
Hypothesis: There will be a correlation between Sibling A aggression rating and Sibling B aggression rating.

4. Lonely Hearts adverts: status vs looks by gender
Is the research an experiment or a non-experiment? Non-experiment.
What is the research method? Content analysis.
What are the variables? Gender and advertised trait (status vs looks).
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Association.
What inferential test is needed? Chi-square.
What value is needed to find the critical value? df.
Aim: To investigate whether advertised traits differ by gender in lonely hearts adverts.
Hypothesis: Males will advertise status more frequently than females, and females will advertise looks more frequently than males.

5. Disabled vs able bodied control rating (1 to 7)
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Quasi-experiment.
What is the experimental design? Independent groups.
What are the IV and DV? IV: disability status (disabled vs able bodied). DV: control rating.
What type of hypothesis is it: directional or non-directional? Non-directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Mann-Whitney U.
What value is needed to find the critical value? N1 and N2.
Aim: To investigate whether perceived control differs between disabled and able-bodied participants.
Hypothesis: There will be a difference in perceived control ratings between disabled and able-bodied participants.

6. Solvable vs unsolvable puzzle, stress rating 1 to 5
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Laboratory experiment.
What is the experimental design? Independent groups.
What are the IV and DV? IV: puzzle type (solvable vs unsolvable). DV: stress rating.
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Mann-Whitney U.
What value is needed to find the critical value? N1 and N2.
Aim: To investigate whether puzzle solvability affects stress ratings.
Hypothesis: Participants given an unsolvable puzzle will report higher stress than participants given a solvable puzzle.

7. IQ before and after multivitamins
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Laboratory experiment.
What is the experimental design? Repeated measures.
What are the IV and DV? IV: time (before vs after multivitamins). DV: IQ score.
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Related t-test.
What value is needed to find the critical value? df.
Aim: To investigate whether multivitamin use changes IQ scores.
Hypothesis: IQ scores will be higher after one month of multivitamin use than before.

8. Academic vs non academic pupils’ happiness ratings
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Quasi-experiment.
What is the experimental design? Independent groups.
What are the IV and DV? IV: pupil type (academic vs non-academic). DV: happiness rating.
What type of hypothesis is it: directional or non-directional? Non-directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Mann-Whitney U.
What value is needed to find the critical value? N1 and N2.
Aim: To investigate whether happiness ratings differ between academic and non-academic pupils.
Hypothesis: There will be a difference in happiness ratings between academic and non-academic pupils.

9. Age group and digit span memory test (20 to 40 vs 40 to 60)
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Quasi-experiment.
What is the experimental design? Independent groups.
What are the IV and DV? IV: age group (20 to 40 vs 40 to 60). DV: digit span score.
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Unrelated t-test.
What value is needed to find the critical value? df.
Aim: To investigate whether digit span memory differs between younger and older adults.
Hypothesis: Participants aged 40 to 60 will have lower digit span scores than participants aged 20 to 40.

10. Aggressive vs non aggressive lyrics, heart rate
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Laboratory experiment.
What is the experimental design? Independent groups.
What are the IV and DV? IV: lyric type (aggressive vs non-aggressive). DV: heart rate.
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Unrelated t-test.
What value is needed to find the critical value? df.
Aim: To investigate whether aggressive lyrics increase heart rate.
Hypothesis: Participants listening to aggressive lyrics will have a higher heart rate than those listening to non-aggressive lyrics.

11. Memory test before and after caffeine pills
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Laboratory experiment.
What is the experimental design? Repeated measures.
What are the IV and DV? IV: caffeine condition (before vs after caffeine). DV: memory test score.
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Related t-test.
What value is needed to find the critical value? df.
Aim: To investigate whether caffeine affects memory test performance.
Hypothesis: Memory test scores will be lower after consuming caffeine pills than before.

12. Finger ratio (2D:4D) and risk-taking score
Is the research an experiment or a non-experiment? Non-experiment.
What is the research method? Correlation.
What are the co-variables? 2D:4D ratio and risk-taking score.
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Correlation.
What inferential test is needed? Pearson’s correlation.
What value is needed to find the critical value? N.
Aim: To investigate whether the 2D:4D ratio is related to risk-taking.
Hypothesis: There will be a correlation between the 2D:4D ratio and risk-taking score.

13. Units of alcohol per week: males vs females
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Quasi-experiment.
What is the experimental design? Independent groups.
What are the IV and DV? IV: gender (male vs female). DV: units of alcohol per week.
What type of hypothesis is it: directional or non-directional? Non-directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Unrelated t-test.
What value is needed to find the critical value? df.
Aim: To investigate whether alcohol consumption differs between males and females.
Hypothesis: There will be a difference in units of alcohol consumed per week between males and females.

14. Couples’ attractiveness ratings: male rates female, female rates male
Is the research an experiment or a non-experiment? Non-experiment.
What is the research method? Correlation.
What are the co-variables? Male partner attractiveness rating and female partner attractiveness rating.
What type of hypothesis is it: directional or non-directional? Non-directional.
Test of difference, association or correlation? Correlation.
What inferential test is needed? Spearman’s rho.
What value is needed to find the critical value? N.
Aim: To investigate whether attractiveness ratings within couples are related.
Hypothesis: There will be a correlation between male and female partner attractiveness ratings.

15. Body shape preference (sizes 6 to 20) male vs female
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Quasi-experiment.
What is the experimental design? Independent groups.
What are the IV and DV? IV: gender (male vs female). DV: preferred body size.
What type of hypothesis is it: directional or non-directional? Non-directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Unrelated t-test.
What value is needed to find the critical value? df.
Aim: To investigate whether preferred female body size differs between males and females.
Hypothesis: There will be a difference in preferred female body size between males and females.

16. Western vs non Western SRRS scores
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Quasi-experiment.
What is the experimental design? Independent groups.
What are the IV and DV? IV: culture (Western vs non-Western). DV: SRRS score.
What type of hypothesis is it: directional or non-directional? Non-directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Unrelated t-test.
What value is needed to find the critical value? df.
Aim: To investigate whether SRRS scores differ between Western and non-Western participants.
Hypothesis: There will be a difference in SRRS scores between Western and non-Western participants.

17. Snack choice before and after nutritional information (apples vs crisps)
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Field experiment.
What is the experimental design? Repeated measures.
What are the IV and DV? IV: time (before vs after nutritional information). DV: snack choice (apple vs crisps).
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Sign test.
What value is needed to find the critical value? N.
Aim: To investigate whether nutritional information changes snack choice.
Hypothesis: More students will choose apples after receiving nutritional information than before.

18. Older vs younger siblings’ IQ
Is the research an experiment or a non-experiment? Non-experiment.
What is the research method? Correlation.
What are the co-variables? Older sibling IQ and younger sibling IQ.
What type of hypothesis is it: directional or non-directional? Non-directional.
Test of difference, association or correlation? Correlation.
What inferential test is needed? Pearson’s correlation.
What value is needed to find the critical value? N.
Aim: To investigate whether older and younger siblings’ IQ scores are related.
Hypothesis: There will be a correlation between older sibling IQ and younger sibling IQ.

19. Garden vs no garden, play outside vs stay at home
Is the research an experiment or a non-experiment? Non-experiment.
What is the research method? Naturalistic observation.
What are the variables? Garden access (yes vs no) and play location (outside vs at home).
What type of hypothesis is it: directional or non-directional? Non-directional.
Test of difference, association or correlation? Association.
What inferential test is needed? Chi-square.
What value is needed to find the critical value? df.
Aim: To investigate whether garden access is associated with play location.
Hypothesis: There will be an association between garden access and children's outdoor play.

20. Jogging vs no jogging, attractiveness ratings 1 to 10
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Laboratory experiment.
What is the experimental design? Independent groups.
What are the IV and DV? IV: condition (jogging vs no jogging). DV: attractiveness rating.
What type of hypothesis is it: directional or non-directional? Non-directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Mann-Whitney U.
What value is needed to find the critical value? N1 and N2.
Aim: To investigate whether jogging affects attractiveness ratings.
Hypothesis: There will be a difference in attractiveness ratings between participants in the jogging and non-jogging conditions.

21. Blueberry supplement and cognition test (before vs after)
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Laboratory experiment.
What is the experimental design? Repeated measures.
What are the IV and DV? IV: time (before vs after supplement). DV: cognition test score.
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Related t-test.
What value is needed to find the critical value? df.
Aim: To investigate whether a blueberry supplement improves cognitive test scores.
Hypothesis: Cognitive test scores will be higher after four months of blueberry supplementation than before.

22. Pregnant rats, alcohol vs control, pup birth weight
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Laboratory experiment.
What is the experimental design? Independent groups.
What are the IV and DV? IV: alcohol exposure (alcohol vs control). DV: pup birth weight.
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Unrelated t-test.
What value is needed to find the critical value? df.
Aim: To investigate whether maternal alcohol consumption affects pup birth weight.
Hypothesis: Pups born to alcohol exposed mothers will have a lower birth weight than pups born to control mothers.

23. Texting vs no texting while driving, cones hit
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Laboratory experiment.
What is the experimental design? Repeated measures.
What are the IV and DV? IV: condition (texting vs no texting). DV: cones hit.
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Related t-test.
What value is needed to find the critical value? df.
Aim: To investigate whether texting increases driving errors measured by cones hit.
Hypothesis: Participants will hit more cones while texting than when not texting.

24. Exercise frequency and blood pressure
Is the research an experiment or a non-experiment? Non-experiment.
What is the research method? Correlation.
What are the co-variables? Exercise frequency and blood pressure.
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Correlation.
What inferential test is needed? Pearson’s correlation.
What value is needed to find the critical value? N.
Aim: To investigate whether exercise frequency is related to blood pressure.
Hypothesis: There will be a negative correlation between exercise frequency and blood pressure.

25. Compulsory homework vs optional homework, exam scores
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Laboratory experiment.
What is the experimental design? Independent groups.
What are the IV and DV? IV: homework condition (compulsory vs optional). DV: exam score.
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Unrelated t-test.
What value is needed to find the critical value? df.
Aim: To investigate whether compulsory homework improves exam scores.
Hypothesis: Students in the compulsory homework condition will achieve higher exam scores than students in the optional homework condition.

26. Age and memory performance (older vs younger children)
Is the research an experiment or a non-experiment? Experiment.
What is the research method? Quasi-experiment.
What is the experimental design? Independent groups.
What are the IV and DV? IV: age group (older vs younger). DV: memory test score.
What type of hypothesis is it: directional or non-directional? Directional.
Test of difference, association or correlation? Difference.
What inferential test is needed? Unrelated t-test.
What value is needed to find the critical value? df.
Aim: To investigate whether memory test performance differs between older and younger children.
Hypothesis: Older children will score lower on the memory test than younger children

Rebecca Sylvia

I am a Londoner with over 30 years of experience teaching psychology at A-Level, IB, and undergraduate levels. Throughout my career, I’ve taught in more than 40 establishments across the UK and internationally, including Spain, Lithuania, and Cyprus. My teaching has been consistently recognised for its high success rates, and I’ve also worked as a consultant in education, supporting institutions in delivering exceptional psychology programmes.

I’ve written various psychology materials and articles, focusing on making complex concepts accessible to students and educators. In addition to teaching, I’ve published peer-reviewed research in the field of eating disorders.

My career began after earning a degree in Psychology and a master’s in Cognitive Neuroscience. Over the years, I’ve combined my academic foundation with hands-on teaching and leadership roles, including serving as Head of Social Sciences.

Outside of my professional life, I have two children and enjoy a variety of interests, including skiing, hiking, playing backgammon, and podcasting. These pursuits keep me curious, active, and grounded—qualities I bring into my teaching and consultancy work. My personal and professional goals include inspiring curiosity about human behaviour, supporting educators, and helping students achieve their full potential.

https://psychstory.co.uk
Previous
Previous

THE SIGN TEST

Next
Next

DISCREET AND CONTINUOUS DATA