Possible errors in the data?

We’ve started¬†collecting our data and ¬†just wanted to double check an issue with you. We’re entering the number of half day absences due to illness but doesn’t this introduce inaccuracies if we’re not clear as to whether the number equates to say 30 children taken ill in the middle of the day (30 sessions) or 15 children absent for a whole day (also 30 sessions). The first situation suggests a much worse ‘outbreak’ of something than the second.

Of course this is also assuming that significant amounts of the illness is in fact caused by flu and not sickness, unapproved holidays, tests etc.


Print view 1

One Response to Possible errors in the data?

  1. Dr Rob says:

    You’ve asked some great questions. It’s correct that using the data we’re collecting it won’t be possible to distinguish between 30 children taken ill in the middle of the day (30 sessions) or 15 children absent for a whole day (30 sessions). However, decipher my data is only really trying to detect ‘if’ and ‘when’ there is an increase in influenza, not how big or severe any outbreak might be. Because of this, the limitations with the data you have spotted shouldn’t have much of an impact on our scientific question. For our main analysis we will be examining data aggregated across a whole week and for all schools across the country that are submitting data, which means these smaller half-day fluctuations such as the ones you describe will have a much smaller impact.

    You also rightly point out that we aren’t collecting any real medical information about what is causing these illness absences and this is another limitation of the project. For example, if lots of children at your school could have something like winter vomiting (Norovirus) and not influenza we won’t be able to distinguish between the two using our data. This is one of the main reasons why we don’t know whether this project will work and we didn’t get to test it out last year. To examine the results this year (if we do see a spike in the data) we plan to look at when it occurred and triangulate it with other sources of Flu data like the National GP data on levels of people going to see their doctor with a ‘flu like illness’ (the best national data source for measuring influenza in the community at present) to see if a peak in the school data occurred at the right time for it to be due to influenza in schools. If our hypothesis is correct then not only will they peak at the same time, but our schools data will start to rise earlier than the GP data. We’ll also look to see if any peak in the school data fits with other diseases such as winter vomiting that might also explain the results. I hope this answers your questions, but I’d be happy to answer any others.

Leave a Comment