Sonia Mathur never considered data science as “real research,” but she now thinks otherwise after exploring correlations between rainwater contaminants and environmental factors with just her computer.
As the world began to shut down mid-March, Sonia Mathur was concerned the summer experience she had been eagerly awaiting would be canceled. For the soon-to-be University High School senior, the Keeping Youth Engaged in Science (KEYS) program was a “needle in the haystack” opportunity that she couldn’t bear to miss.
To uphold the mission of the BIO5 Institute and inspire the next generation of STEM professionals amid the COVID-19 pandemic, KEYS program coordinators Brooke Moreno and Kelle Hyland quickly transitioned the in-person, seven-week research program to an entirely remote experience. On June 8, Sonia joined 48 other outstanding Arizona high school students in this year’s one-of-a-kind virtual computer science research program.
Bringing just her limited Microsoft Excel skills from Advanced Placement chemistry to the table, Sonia’s computer programming skills were expanded over the last seven weeks as a research intern for Dr. Leif Abrell. Through her KEYS internship, Sonia not only discovered trends between various environmental factors and harvested rainwater contaminants, but she completely transformed her view on the importance of data science research.
Examining harvested rainwater contaminants
Due to the shortage of water in the desert, many community members harvest rainwater, but the safety of this resource, especially in underserved communities, is currently unknown. To address this gap in knowledge, Sonia performed various analyses on Project Harvest data to assess the influence of sampling season, year and community on the concentrations of harvested rainwater contaminants.
Sonia’s light blue poster, entitled “What’s in Your Rainwater? Inorganic and Organic Contaminants Measured in Roof-Harvested Rainwater,” featured three visualizations and one statistical summary for each of three water contaminants: lead, zinc and 4-nonylphenol. Sonia first used Excel to reorganize the citizen- and laboratory-generated data. With RStudio, she then analyzed data on the three chosen impurities and created visualizations to depict the relationships between each contaminant and sampling season, year and community.
Through her statistical analysis, Sonia found that lead, zinc and 4-nonylphenol were highest in concentration in the rainwater collected after the first monsoon in each season. Sonia explained that this trend agrees with the “First-Flush Phenomenon,” whereby an accumulation of corrosion and matter during the dry season is washed away by the first rains. She also found that the concentration of all three contaminants were highest in Tucson and lowest in Dewey-Humboldt.
In the future, Sonia would like to perform further statistical analyses to better understand the safety of harvested rainwater for domestic use. She ultimately aims to generate guidelines for safely collecting and using harvested rainwater. Sonia is also curious about the sources of contamination, though she suspects the lead may originate from pipes and 4-nonylphenol might be derived from the environment.
A new perspective on data science
Sonia previously only considered traditional wet-bench laboratories in which researchers in white coats examine specimens as “real research.” She admittedly never understood how coding and data science could be applied to bioscience research, and she only thought engineers used computer programming to conduct their studies.
“Before KEYS, I never had an interest in data science. I thought learning coding meant you were going into engineering, and I didn’t want to be an engineer. I wanted to be a scientist,” she said.
Since Sonia aspired to be a scientist, she didn’t find it useful to expand her limited computer programming skills. Having completed her KEYS research project that had a heavy emphasis on data science, Sonia now appreciates the utility of computer programming in any discipline.
“Getting to use coding as an essential part of my project was eye-opening. I gained more confidence in my skills, and I now have more respect for coding,” Sonia said. “Data science is a huge part of research, especially as our world continues to grow.”