Sleep walking into clinical data science
If I’ve learned one thing over the last five months in the Computational Oncology lab, it is that real world data is a whole different ball game.
As a fresh-out-of-undergrad master’s student, with a background in cognitive neuroscience and biological sciences, a data science project wasn’t entirely in my toolkit. My programming skills was a working knowledge of Java and online courses in Python and Machine Learning picked up over the summer. With MRes project choices, I found myself with a unique opportunity to gain experience in a lab, working with highly experienced researchers, clinicians and students. I figured a project in the Computational Oncology lab would be a challenge – but what better way to learn how to analyse large datasets or apply machine learning methods than immerse myself in a project.
For the past 5 months, I have been working under Dr Seema Dadhania on a project from the ongoing BrainWear clinical study. Specifically, I was tasked with analysing sleep in patients with High Grade Gliomas, a malignant primary brain tumour. Sleep disturbance is one of the most commonly experienced symptoms for patients with High Grade Gliomas. In fact, in a 2018 study by Garg et al, it was found that disrupted sleeping behaviours were three times more prevalent in patients with primary brain tumours than healthy controls, and was linked to decreased quality of life. Despite how pervasive and burdening sleep problems can be in brain tumour patients, research remains scarce and the studies that have been done use self-reported measures such as the EORTC-QLQ C30 and MDASI-BT, questionnaires which assess quality of life or severity of symptoms. These measures run the risk of subjectivity – patients may quantify difficulty with sleep in different ways which skews the translatability of findings to real life. With the collection of longitudinal accelerometer data, BrainWear gives the opportunity to objectively understand sleep patterns and changes that occur with treatment.
I was tasked with processing wearable accelerometer data for 36 patients with High Grade Gliomas, designing a methodology for analysis and analysing the data using Python and R. On first glance, this may seem straightforward – 36 patients wearing accelerometers for extended intervals will provide substantial amounts of data for interpretation – but, a real dataset isn’t comparable to the ones you encounter in courses in neatly formatted tables and a clear direction for analysis.
In fact, data analysis is only about 20% of the job – the real work comes before. First off, there isn’t a set method to process accelerometer data. AX3 accelerometers measure acceleration along 3 axes, how do you convert this into metrics of sleep duration and quality? Thankfully, work is being done by talented research teams across the globe. This analysis was possible using GGIR, an R package built by Vincent Van Hees, which converts raw accelerometer data into metrics in daily spells.
Once processing was done, I then began the lengthy process of making sense of the data. Real world data are messy to collate, each participant has variable data scattered across multiple files. Moving this data into an actionable format requires several rounds of subsetting files by date, collating and cleaning. Once data are more manageable I can then get to the real analysis. That was the plan!
The reality of accelerometer data is that there is a significant amount of data loss — due to gaps when patients do not wear (or intermittently wear) their accelerometers. Processing this data reduces my sample to 34, due to lack of adequate data to run the files through the sleep detection algorithms. Then, the sample is whittled down further when we exclude days with less than 16 hours of good wear time, and even further, when we exclude days where the algorithm is not able to identify a sleep time period.
My final dataset – 12 patients with data around time periods of their surgery, 15 patients with data during chemoradiation and only 8 patients during chemotherapy. In an ideal world, when assessing changes with treatment, patients would have data across their whole treatment span – but often, data are intermittent. For example, patients may wear accelerometers post-surgery up until chemoradiotherapy and no further. In my analysis, I quickly realised it wasn’t how long patients wore the accelerometer for but if it was worn at the “right” time – in the days immediately before or after surgery, during specific weeks of chemoradiotherapy or chemotherapy. Even this is difficult to interpret if there are no baseline data before treatment begins. So, you find yourself trying to piece together the puzzle of sleep using subsets from various patients at various times – making statistically sound conclusions difficult.
As a feasibility study, this analysis informs further study design and makes us ask real questions. A wear-as-long-as-possible patient management approach does not appear to be the right option. Perhaps, robust intervals of intermittent wear will likely give better results. For instance, giving patients targeted goals – to wear accelerometers for one week pre-chemoradiotherapy and then for specific weeks during chemoradiotherapy. This strategy might help clinicians better evaluate progression or changes in sleep — and the non-wear data problem is mitigated. Rather than using averages over extended periods (such as the whole of chemoradiation) to get a general pattern of sleep with patients who provide data at different intervals, shorter wear-time at certain points during treatment could provide more accurate and calibrated patterns.
Coming to the end of my time at the Computational Oncology lab, I am grateful for the learning experience I’ve had. The challenges I’ve faced, the lessons I’ve learned about data processing, statistical analysis, and working with real-life digital, clinical data. A powerful realisation is that converting unstructured data to pattern recognition in digital health is a key challenge – a critical step before advocating interventions in patient management and digital health delivery. Accelerometer data is an asset, but to uncover patterns, learning to structure, match, parse, and analyse this data are the first steps to informing clinical practice.
I could never have predicted what I would walk into – the oncology lab project using accelerometer data to measure sleep now has my eyes wide open.