Analysing the Data – Lesson Two

A Health Warning:

As with all the DMD lesson plans – it might look like there’s a lot to do but these are just ideas to help you deliver your lesson. Feel free to scan through and pick out the bits you think will work with your class. If you tried something different – let us know or leave a comment.


Get a feel for what you might be encouraging students to look for in their analysis by watching this clip of Dr Rob looking at last year’s data

In this lesson, students are given practice identifying patterns in abstract data and suggesting explanations using a familiar example before being introduced to the data analysis tools on the DMD website.

They complete descriptive time trend analysis to compare visits to GPs reporting influenza-like illness with the illness absence data from their own school and other schools looking for and attempting to explain patterns, natural variation and anomalies in the data.

Scatter graph analysis: Students then complete a simple scatter graph analysis in which they will look at the correlation between nationwide school illness absence data and the core variables collected in the school profiles for the week that absence data is peaking (e.g. size of school, staff/student ratios, age of buildings).

Prior learning:

This lesson follows from lesson 1 (wherein students looked at and categorised variables associated with the experiment and the relationships that variables can have). Students will use DMD’s own graphing tools so no understanding of spreadsheet software is specifically required unless students wish to download and process the data themselves.

Points to be aware of:

  • “Weeks” is not a continuous variable. Epidemiologists would normally represent the kind of data shown in the first “descriptive time trend analysis” using a bar chart but the sheer number of variables and data points that could be displayed at any given time on the screen make this impractical. Students should be aware that although there is always a “correct” way of drawing a graph, sometimes you have to compromise!
  • The national flu data from the Sentinel GP practices is divided into 4 regions (North, Central, South and Wales). There are 60 practices which are on the whole distributed according to population density (for example, there are 12 alone in the London area) rather than in an even geographic spread. For a number of reasons, only two practices are in Wales so the data for that one region may behave erratically (or as Dr Rob would put it “there are wide confidence intervals around the data points”)


Sheet DMD2-1: Describing and explaining patterns [also available as an editable Word document]

Sheet DMD2-2: Analysing the Data Level Ladder [Word]

Sheet DMD2-3: Analysis Questions [Word]

Sheet DMD2-4: How good is your correlation? [Word]

Sheet DMD2-5: Time Trend Key Words

Key words / language:

Descriptive time trend analysis (describing how a variable changes over time)

Scatter graph analysis (drawing a line of best fit and describing how well the data fits this line)

Learning objectives:

  • to describe and explain patterns in data

Learning outcomes:

  • students are able to describe how their school’s absence figures change over time and how this compares with regional and national flu data.

  • students identify correlations between outbreak variables and absence / flu data and describe the type and quality of correlations they find

Possible APP focuses:

  • AF5 Thread 3: Select the most relevant data to reach a conclusion, explain how the selection or rejection of data can lead to different conclusions, using scientific knowledge and understanding

  • AF5 Thread 4 Explain inconsistencies in the data using scientific knowledge and understanding, comment on how reliable the range of data is, taking into consideration: number of repeats number of data points choice of equipment procedure

See sheet DMD2-2: Analysing the Data Level Ladder for further information.


This optional starter was chosen to get students thinking about pattens in abstract data but within a familiar context. You may have your own data sets you’d prefer to use.

Provide students with copies of DMD2-1 in small groups (or project it). They should be encouraged to describe small and larger trends they see. Can they work out what the graph might be representing?

Actually the graph represents the frequency of reported breakups on Facebook and comes from David McCandless’ and Lee Byron’s website where he provides more information about how the data was collected.

There won’t be any hard and fast right or wrong answers about this data (which makes it particularly suitable for this activity), most observations could have more than one plausible answer (e.g. are the Monday peaks from people feeling fed up on Mondays or from people updating their statuses after the weekend?)  and the data will be close to students’ hearts!


Get students logged in to the DMD website. Your students can register using the link here.

While students log in, it would be worth inviting them to consider what kind of variables we have for the weekly absence data and thus what graph they should see waiting for them points to be aware of below.

Sheet DMD2-5: Time Trend Key Words may provide a helpful stimulus

1) Descriptive time trend analysis

Graph: Time trend analysis

First use the line graph plotting national school data versus national data. Students can use sheet DMD2-3: Analysis Questions to help prompt them. There is space for them to add their own questions. This could either be given to students or projected on the board. You might want students to Ask the Scientists these questions.

We put together a short clip showing how Dr Rob might look at the data: watch here.You are the intended audience but you could also choose to show the first part of the clip to students to get them thinking about some of the issues surrounding the data.

Students should approach this data with the same mindset as they did the Facebook data from the starter activity. They can switch on or off multiple data sets. They need to describe patterns in the data they see and try to explain them. There are countless possible ways they could do this. See below for extra information on points to be aware of below.

Each student / group should be able to make a unique observation – most should be able to come up with a potential explanation. You and your students can record observations and comments in a LabLog on the site. When completing the LabLog, students could be given sheet DMD2-2: Analysing the Data Level Ladder to help them frame a high quality response.

2) Scatter graph Analysis

Graph: Scatter graph Analysis

The XY scatter graphs show aggregated Years 7-11 schools absence data from across the country and any of the outbreak variables including the number of students ygiene, illness, household, vaccination and transport points.

The graph will default to two weeks before the current date (as this is likely to be current but with a good response of uploaded data). Your students can make comparisons for any week but we recommend looking at the current peak absence week (for the national schools data) so you have the most illness data to play with (be aware that until the outbreak occurs, the “peak” week may not have anything to do with flu but it will still be of interest to Rob and his team).

As above, observations, explanations and comments on the quality of data can be saved to the students’ LabLogs in advance of the next lesson (with reference to sheet DMD2-2).

You could provide your students with a copy of DMD2-4: How good is your correlation?  to help them see what different strengths of correlations look like. If you are doing this lesson with a GCSE class, it may be worth getting in contact with their Maths teacher to see if they are doing Pearson tests (PCCs) to rate the quality of correlations as their Maths teacher may wish to use the data to support their own teaching.


Students could be given a writing frame “In the data I saw that…” “I think this might be because …” printed on A4 paper to hold up.


With a GCSE class, your students may be looking at distribution in data. When Dr Rob looks at his data, he needs to consider how it is distributed as there are a number of tests he could do on it that rely on the data following a normal distribution. This would be a good opportunity for some cross curricular work with the Maths dept.

You may wish to invite your students to download and describe the spread of their chosen Outbreak data. For each continuous variable, the students can examine the following characteristics of the data:

  • mean

  • median

  • mode

  • minimum

  • maximum

The students should use the characteristics calculated above to describe the data. For example, is the mean greater than the median and if so why?

To give your students practice at plotting historgrams, you can use this tool on the National Council of Teachers of Mathematics website. Enter your data into the box on the NCTM site (with one piece of data per line) and then adjust the slider bar for the interval size to show how this impacts on the  graph.

The website also has some helpful examples such as “spending per student in some schools” and “NBA team payrolls” which illustrate the distribution of data. The NBA team payrolls example shows data that relates to how much basketball players are paid and is a good example of ‘right skewed’ data. This means that there are a low number of people that are very highly paid compared to everyone else. This makes the mean greater than the median and when looking at the graph it is skewed to the right i.e. it is not normally distributed.

When analysing data like this epidemiologists have to take account of this right skew as many of their statistical tests won’t work correctly on this type of data.

2 Responses to Analysing the Data – Lesson Two

  1. Pingback: Impressions of XY Scatter graph Analysis | FLU!

  2. Pingback: Got something to say about the data? Write a LabLog! | FLU!

Leave a Comment