EcoIntelligence: AI-Powered Algae Predictive Models in Lake Monitoring

We will use AI Models to predict the amount of Chlorophyll A in lakes using datasets found from research.
Grade 11

Problem

Introduction: Cyanobacteria blooms are among the most dangerous threats to lake ecosystems. These blooms use excessive amounts of nutrients, most notably oxygen, nitrogen, and phosphorus, to carry out biological processes. The depletion of the dissolved oxygen within lake ecosystems can have dire consequences on the aquatic life of that ecosystem. In addition, when cyanobacteria die, detritivores also drain dissolved oxygen reserves in order to decompose the dead organic matter. The depletion of dissolved oxygen within an ecosystem can lead to hypoxia, a state in which there is so little oxygen that the ecosystem cannot function, leading to its death. These impacts have been discerned and many techniques have been developed to treat cyanobacterial blooms such as aeration, mechanical mixing, and coagulation. However, none of these methods can be implemented until it has been proven that a cyanobacterial bloom is occurring, and oftentimes, that is the hardest part. Traditionally, to determine whether a cyanobacterial bloom was occurring, various lab tests would need to be completed using water from the site. However, this is time consuming and resource intensive. New techniques involving satellite imagery and AI algorithms have been developed to detect these blooms, and these techniques have proven to be successful. Various studies using satellite data and AI algorithms to estimate and visualize the chlorophyll within a lake have been conducted (Liu et al., 2021; Akbar et al., 2010). However, despite the extensive research that has been done around the world, none of these techniques have been applied to lakes within Alberta. 



 

Problem: Cyanobacterial blooms have many harmful effects on aquatic ecosystems and humans. When left untreated, these blooms can render ecosystems uninhabitable and the water undrinkable. Many methods have been developed to treat cyanobacterial blooms and mitigate their effects, but before treatment can begin, the scope of the algal bloom needs to be determined in a timely manner. Traditionally, various lab tests would need to be completed using water from the site. However, this is time consuming, resource intensive, and impractical for remote locations. New techniques involving satellite imagery and AI algorithms can help detect cyanobacterial blooms. 

 

Objective: We suggest combining ML techniques with satellite imagery and Alberta water monitoring data to estimate the chlorophyll levels within a lake to predict harmful algae blooms in a timely manner. 

 

Background Information: Cyanobacterial blooms use excessive amounts of nutrients, most notably oxygen, nitrogen, and phosphorus, to carry out biological processes. The depletion of the dissolved oxygen within lake ecosystems can have dire consequences on the aquatic life of that ecosystem. In addition, when cyanobacteria die, detritivores drain dissolved oxygen reserves in order to decompose the dead organic matter. The decomposition of the dead matter also releases harmful toxins that render the water undrinkable. This ability to make water undrinkable makes cyanobacteria very harmful to humans, and the ability is especially important in the midst of a global water shortage. The depletion of dissolved oxygen within an ecosystem can lead to hypoxia, a state in which there is so little oxygen that the ecosystem cannot function, leading to dead spots, areas where no life can exist.  

 

Thylakoids are important structures found within algae, and are responsible for photosynthesis, a process which converts energy within photons (light) to energy that is accessible to the rest of the cell in the form of ATP. Photosynthesis is split into two stages: the light dependent process and the light independent process. The light dependent process is the most important stage for this project. Within the light dependent process, photosystems found within the thylakoid membrane absorb light using a variety of enzymes, including chlorophyll. Around 75% of the enzymes within a thylakoid are chlorophyll A. Chlorophyll A is able to absorb most bands of light, but it is unable to absorb green bands of light and thus reflects it. That is the reason why Chlorophyll, and leaves, are green.

In Europe, satellite imagery is also used to detect algal blooms. The European Space Agency started the Sentinel Mission, a group of geographical satellites that analyze various aspects of the earth and its atmosphere. In particular, Sentinel 2 satellites provide high quality multispectral images that can be used for our project. The LANDSAT-8 satellite launched by NASA and the USGS has also been used more recently in the US to detect cyanobacterial blooms. By analyzing the reflectance of green bands of light within the imagery, the location and concentration of the chlorophyll, and thus the cyanobacteria, within the lake can be determined. 

 

Method

Method: Our solution is to combine both known methods. The in situ data will be used to create a machine learning model to predict the amount of chlorophyll in the lake, and thus the amount of algae. Any shortcomings in the data will be compensated for by the satellite imagery, which will collect estimations of that data. 

Analysis

 The results of the second phase of the project are promising, as we were able to achieve an accuracy of around 77%, a significant improvement over the accuracy of the model following the end of the first phase. By filling the unknown in situ values with values obtained using satellite imagery, the model had access to more information that could be used to determine the chlorophyll concentration.  

While creating a model, our group also ran other algorithms to determine the most important nutrients regarding the detection of a cyanobacterial bloom. The model determined that silica, red band light from the satellite, and nitrogen are the most important nutrients. The most interesting part is that even though silica is not an important nutrient in the growth of cyanobacteria, its value is still largely correlated to the amount of cyanobacteria. This may be because a lack of silica will also lead to a lack of other types of algae, specifically diatoms, and thus less competition.    
 

Conclusion

This project was able to successfully use in situ data as well as satellite data to predict the amount of cyanobacteria within a lake and thus, detect whether an algal bloom will occur. Although the first phase of the project was not as successful as we initially planned, by shifting the focus of the project to include satellite data, we were able to improve the accuracy of the data by over 30%. There are many different steps that we are thinking of taking from here. The use of more data from satellites or the use of a more advanced machine learning model are among the many distinct possibilities.

Citations

References

Kaiser, D. E. (n.d.). Understanding phosphorus fertilizers. UMN Extension. https://extension.umn.edu/phosphorus-and-potassium/understanding-phosphorus-fertilizers 

Croft, M. T., Warren, M. J., & Smith, A. G. (2006, August). Algae need their vitamins. Eukaryotic cell. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1539151/ 

Lakes, rivers, and streams. Environmental Resilience Institute. (n.d.). https://eri.iu.edu/erit/implications/lakes-rivers-streams.html#:~:text=Warmer%20water%20temperatures%20in%20deep,mortality%20and%20toxic%20algal%20blooms. 

Control and treatment. U.S. National Office for Harmful Algal Blooms. (n.d.). https://hab.whoi.edu/response/control-and-treatment/ 

Staley, Z. R., Harwood, V. J., & Rohl, J. R. (n.d.). A Synthesis of the Effects of Pesticides on Microbial Persistence in Aquatic Ecosystems. National Library of Medicine. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7918059/#:~:text=Nitrogen%20and%20phosphorus%20are%20essential,supplied%20in%20an%20acceptable%20form. 

Ministry of Environment and Climate Change Strategy. (2022, June 13). What causes an algae bloom?. Province of British Columbia. https://www2.gov.bc.ca/gov/content/environment/air-land-water/water/water-quality/algae-watch/what-are-algae/causes-of-an-algae-bloom 

Sumeep Bath on December 13, & Bath, S. (n.d.). What are algal blooms and why do they matter?. International Institute for Sustainable Development. https://www.iisd.org/articles/insight/what-are-algal-blooms-and-why-do-they-matter#:~:text=Should%20we%20be%20concerned%20about,the%20water%2C%20occasionally%20killing%20fish. 

How plants use nutrients. Extension. (2021, August 1). https://extension.wvu.edu/lawn-gardening-pests/news/2021/08/01/how-plants-use-nutrients#:~:text=Nitrogen%20is%20needed%20for%20plant,and%20complete%20the%20reproduction%20cycle. 

Lin, S., Pierson, D. C., & Mesman, J. P. (2023, January 3). Prediction of algal blooms via data-driven machine learning models: An evaluation using data from a well-monitored mesotrophic Lake. Geoscientific Model Development. https://gmd.copernicus.org/articles/16/35/2023/#section2 

Silicon, diatoms in aquaculture. (n.d.). https://aquafishcrsp.oregonstate.edu/sites/aquafishcrsp.oregonstate.edu/files/boyd2014silicondiatoms_gaa.pdf 

Effects of ph on algal abundance - deep blue. (n.d.-a). https://deepblue.lib.umich.edu/bitstream/handle/2027.42/57443/Bergstrom_McKeel_Patel_2007.pdf?se 

Surface Water Quality Data. Alberta.ca. (n.d.). https://www.alberta.ca/surface-water-quality-data 

What is the Landsat Satellite Program and why is it important?. What is the Landsat satellite program and why is it important? | U.S. Geological Survey. (n.d.). https://www.usgs.gov/faqs/what-landsat-satellite-program-and-why-it-important#:~:text=The%20Landsat%20Program%20is%20a,was%20later%20renamed%20Landsat%201. 

Water indicators – lake trophic status. Alberta.ca. (n.d.-b). https://www.alberta.ca/water-indicators-lake-trophic-status#:~:text=Understanding%20trophic%20status%20classification,and%20very%20high%20(hypereutrophic). 

Liu, M., Ling, H., Wu, D., Su, X., & Cao, Z. (2021, November 8). Sentinel-2 and landsat-8 observations for harmful algae blooms in a small eutrophic lake. MDPI. https://www.mdpi.com/2072-4292/13/21/4479 

A REMOTE SENSING BASED FRAMEWORK FOR PREDICTING WATER QUALITY OF DIFFERENT SOURCE WATERS . CiteSeerX. (2010). https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=5c1f384f2478efb17a83c2f5ade6da059c7dbbea

Environmental Protection Agency. (n.d.). EPA. https://www.epa.gov/habs/control-measures-cyanobacterial-habs-surface-water#:~:text=Algaecides%20are%20chemical%20compounds%20applied,Potassium%20permanganate 

 US Department of Commerce, N. O. and A. A. (2019, March 14). Low or depleted oxygen in a water body often leads to ’dead zones ’- regions where life cannot be sustained. NOAA’s National Ocean Service. https://oceanservice.noaa.gov/hazards/hypoxia/ 

Ritter, B., Fraser, D., & Burley, K. L. (2007). Nelson Biology: Alberta 20-30. Thomson Nelson. 

Sentinel-3. Sentinel Online. (2022, January 25). https://sentinels.copernicus.eu/web/sentinel/missions/sentinel-3 

Landsat 101. Landsat 101 | U.S. Geological Survey. (n.d.). https://www.usgs.gov/landsat-legacy/landsat-101

Yaakob, Maizatul Azrina, et al. “Influence of Nitrogen and Phosphorus on Microalgal Growth, Biomass, Lipid, and Fatty Acid Production: An Overview.” Cells, vol. 10, no. 2, 14 Feb. 2021, p. 393, https://doi.org/10.3390/cells10020393.

 

Acknowledgement

There are a number of people whom we would like to thank for their contributions to our project. Our amazing mentors, Tim Gubski, a Princeton University student and Irada Shamilova, the head of Juniotech, for their constant work to help us get this project ready. We would not have been able to complete this project with as much as we did without the countless hours that they spent helping us. Tim and Irada were our primary instructors regarding artificial intelligence and machine learning, and the initial linear regression model was a result of everything that they taught us. We would also like to show appreciation to Zainab Akhtar, a Geospatial analyst with the Qatar Computing Research Institute, who met with us multiple times. Zainab was the person who recommended that we use the Google Earth Engine to collect our satellite data and if it wasn’t for her recommendations on how we should utilize satellite data we would not have been able to complete the second phase of the project. Finally, Gijs van den Dool, a senior geospatial data scientist and independent researcher. The expertise that he provided when we consulted with him on the project was invaluable, and it gave us guidance during a crucial stage of the second phase.