There has been much discussion and debate in the scientific community regarding the efficacy and suitability of machine learning techniques to help improve our understanding of local and global environments. Machine learning allows for predictive and probability-based calculations to be undertaken – which are useful tools for evaluating the benefits and costs of our actions in the present. It is useful for those active in climate science to understand the strengths and limitations of current machine learning techniques, as this results in better understanding and criticism of any published findings and conclusions.
What is Machine Learning?
Machine learning falls under the broader term Artificial Intelligence (AI), which is defined in a 2004 paper as “the science and engineering of making intelligent machines, in particular intelligent computer programs”. The true nature of ‘intelligence’ is hotly debated, but for this purpose, intelligence is artificial, in the sense that computer models are used to draw conclusions from complex datasets. Models are usually designed for research that would be impractical or excessively laborious to carry out with conventional analysis.
The diagram below illustrates how popular machine learning terms are related:
It is also important to understand the following five terms:
An algorithm is a set of instructions (in this context, supplied to a computer) that transforms input information into output information. For example, calculating the carbon footprint of an organisation by assessing variables such as fuel or energy consumption, manufacturing processes, and any offset efforts.
A model is the algorithmic representation of a system (such as climate or an economy). Usually, a model comprises multiple algorithms that solve a complex problem.
Structured Data is data that is labelled, where its nature has already been determined, for example, temperature values. Classical machine learning mainly uses structured data.
Unstructured Data is data presented in raw forms, such as images. Deep learning models can operate on both structured and unstructured data to create natural language processing and visual recognition systems. However, these require higher levels of computing power than classical machine learning methods.
Neural networks are one of the most important computational techniques for machine learning. A neural network is a software model consisting of several connected nodes. Both the nodes and the connections are important. Below is a simple diagram of how neural networks can be structured.
Each network has inputs from either data or previous nodes, one or more hidden layers (algorithms that can modify the input), and an output. If a node’s algorithm produces a result that exceeds a set threshold value, then the output is activated. Each connection can also be assigned a weight to indicate how useful it is in predicting an overall result. Connections that are more useful in predicting a result receive a higher weight. Less useful connections are assigned a lower weight or may even be dropped.
Therefore, with the repeated presentation of data and comparison of the predicted outputs, the neural network learns to represent the system being modelled more accurately. If there is confidence in the model, it can be applied to new datasets, where answers are not understood, or to hypothetical datasets that might exist in the future.
Machine learning methodologies can also be categorised into three types of learning: Supervised, Unsupervised, and Reinforcement learning. These are summarised below and are selected based on the type of data being used and the desired output:
Supervised learning is appropriate when the data are considered well understood (usually structured datasets) but the relationship between them is complex – for example, economic modelling.
Unsupervised learning is used to create insights from unstructured data not directly related to the problem being solved, such as imagery or sound.
Reinforcement learning is used to optimise algorithms based on trial and error, again through the repeated presentation of data. The algorithms that most successfully predict the correct answer are preferred.
What are the Data Requirements for Machine Learning Techniques?
The data requirements for different machine learning techniques vary based on the requirements of each model. Usually, the dataset needs to be large enough to be split into training and testing subsets. The training dataset is used to train the model, whilst the testing dataset is used to assess the accuracy of the model. Some research, however, also requires an additional validation dataset. Typically, data are split in a 70/30 ratio between the training and testing sets and this can be done in a variety of ways. It can be split spatially (for example by region), temporally (over different time periods), or even categorised based on different variables, for example land cover.
3 Ways in Which Machine Learning Techniques Help Address Climate Change
1. Improved Data Analysis
Machine learning can help tackle climate change by looking at data to spot patterns and trends that are not recognisable to the human eye or are not practical for humans to monitor. For example, machine learning models enable automatic and continuous monitoring of global imagery to identify wildfires, landslides, and other visible phenomena using pattern and image recognition. Reinforcement learning allows the models to become increasingly accurate in identifying changes and hazards. These can then be identified and evaluated by an expert and forwarded to the relevant authority for mitigation.
Other applications combine disparate datasets to draw new conclusions or enable the identification of useful insights. For example, deforestation or coral bleaching data could be combined with meteorological data to understand how each impact the other.
A more abstract application is sentiment and preparedness analysis. This seeks to understand human thoughts and feelings towards climate change and attitudes to mitigation efforts. The data are usually collected through social media or crowdsourcing strategies.
By assessing the collective feelings and attitudes of communities towards tackling climate change, organisations and authorities can help improve services, for example, hazard preparedness schemes or local initiatives to help improve quality of life. By comparing the attitude of different demographics, it is possible to identify targets for information, education, and strategies to combat disinformation.
2. Optimising Systems and Solutions
Machine learning can tackle climate change by enhancing or adjusting technical systems to best utilise resources, based on contextual information supplied to the model. For example, automated electricity grids optimise energy production by monitoring and predicting energy supply and demand. Machine learning could use traffic information to predict demand for electric cars charging the following night. This can also be applied to local initiatives, for example trying to reduce the urban island heat effect by using machine learning to optimise urban planning, considering variables such as infrastructure and vegetation cover.
Another example is carbon sequestration modelling. This technique assesses how much carbon is being stored in different forms across the globe. Machine learning models can be used to simulate carbon sequestration and its impact over time – which can then be used to design smarter carbon capture systems.
3. Scenario Modelling and Planning
A third use of machine learning to tackle climate change is the prediction and modelling of future scenarios under anthropogenically induced climate change. One of the most urgent applications of this is modelling the frequency and severity of extreme weather events. This can include droughts, wildfires, extreme precipitation, flooding, and landslides. This task can be accomplished by associating variables (for example, temperature and precipitation) with the occurrence of a specific hazard, to then predict how the frequency or severity of that hazard may change under various future scenarios. Predictive modelling can also be used to monitor the impacts of different scenarios on ecosystems, both in terms of species population modelling and also to address how long-term processes such as the rate of coral bleaching may vary under different environmental conditions.
What are the Benefits of Using Machine Learning to Tackle Climate Change?
The key benefit of utilising machine learning is that it allows us to simplify, categorise and make predictions based on highly complex datasets. Data can be analysed across larger spatial and temporal scales to make observations on intricate processes, allowing for global monitoring and mobilisation. In terms of future development, machine learning is becoming an increasingly viable technique for data analysis as the cost of processing power and data storage reduces, driven by the efficiencies of cloud computing. Furthermore, a huge increase in data availability, fuelled by different resources such as the Internet of Things and crowdsourcing methods allows for the expanded application of machine learning techniques to tackle climate change.
What are the Limitations and Risks of Machine Learning?
Four limitations of machine learning must be recognised to ensure the integrity of model outputs:
1. Lack of Data
Machine learning works best when the models are trained on a wide range of scenarios, in which the full impact of each variable can be evaluated; including extreme and edge cases. Given that high-quality satellite data has only been available for less than sixty years, environmental machine learning models are largely limited to recent decades. There are no datasets from major ice ages or interglacial periods from which to learn how the environment might change under more extreme conditions. There is, therefore, a risk that machine learning models will fail to identify relationships and feedback loops for situations outside those upon which it has been trained.
This makes a lot of machine learning techniques with current datasets unsuitable for long-term predictions. When looking at long-term environmental changes, change must be considered over centuries – even if the media focuses on the immediate concerns of the next 50-75 years. There are other contexts where the datasets are simply unavailable. For example, as of 2022, less than 25% of ocean floors have been mapped globally. Once achieved, completed mapping could improve the management of fisheries and conservation efforts.
2. Errors, Bias and Incomplete Data
Whilst data collection methods are becoming increasingly accurate and reproducible due to increased automation and higher quality measuring instruments, there is still an abundance of errors that can occur. If these are not addressed, they can invalidate the conclusions drawn by machine learning models. Whilst this can be mitigated by making comparisons to previously collected datasets or long-term averages, the risk of error is not completely avoidable.
Another matter is the risk of inherent bias in the data collection method, whether this is through the choice of data used to train the model or the context in which the data was collected. Put simply, machines can only learn from the data supplied; any factors outside those datasets will not feature in any learned model. Whilst this is not always a problem, it is important that this is included in any commentary on the data. An example of this is mentioning the impact of the Covid-19 pandemic on the drop in global carbon dioxide emissions during 2020.
Machine learning can produce highly accurate models which can adapt to different data inputs and scenarios, but may not yield formulae or relationships that are visible for independent verification. A concern often raised by scientists is the risk of relying on ‘black box’ machine learning models that humans cannot understand.
Neural networks are a self-adapting, data-driven abstraction of reality; the inputs and the outputs can be observed but there may be no inherent logic within the neural network that can be critically reviewed – it is effectively accumulated experience acquired by repeated observation. Considering this, can the model be trusted to be accurate in the future just because it has been accurate in the past? Of course, this criticism can be levied at many scientific theories that are based on analysis rather than derived from fundamental principles, but such theories generally include within them a comprehensible logic to which some level of certainty or risk can be assigned.
As a result, it is becoming an increasingly common requirement that machine learning models are interpretable. This results in a more ethical but constrained model based on a limited number of variations. Understanding both the benefits and limitations of black box models allow us to compare and critique different machine learning algorithms in a proactive way.
4. Energy Consumption
A final consideration of using machine learning techniques to tackle climate change is whether the findings negate the carbon emissions produced by the storage and analysis of such large datasets. As data storage and computing power become increasingly optimised and renewable power availability increases, this should become less of a concern.
You might also like: Uncovering the Environmental Impact of Cloud Computing
Will Machine Learning Help Tackle Climate Change?
The consensus amongst climate and earth scientists is that machine learning models are powerful tools. Used wisely, machine learning has the potential to make climate science more widely available and applicable through the industrialised analysis of data. Furthermore, since machines have no inherent bias, deep learning in particular, may produce insights that elude other forms of research, either because the data was not suitable for traditional analysis or because the inference is unexpected.
However, the majority of black-box models are not ideal for providing us with reliable projections outside the range of data used to train the model. Models can only learn to generate outputs within the range of the inputs used for training and therefore, they often have no knowledge of errors or extraneous factors that might also be relevant. Therefore, it must be accepted that researchers often cannot articulate exactly how neural networks come to their conclusions, which makes it risky to rely on them alone for critical decisions.
The resulting push for greater transparency and availability has led to an increased publication of machine learning model code scripts, in addition to the datasets used. As technological capabilities develop, it is essential that climate change models are based on the scientific processes that underly the Earth’s natural systems and cycles. These large-scale models could incorporate machine learning algorithms, however most likely as part of a larger solution.
In a commercial context, one up-and-coming platform for tackling climate change using machine learning is Microsoft’s AI for Earth Programme. Launched in 2017, it aims to distribute two hundred research grants (totalling $50 million) to projects using artificial intelligence to address environmental damage. Using Microsoft’s platform and interface, researchers and scientists can share data, methods, and conclusions, directly allowing for increased transparency and critical analysis. The goal is to create a collaborative space to mitigate climate change impacts by connecting experts. Other initiatives include Climate Change AI and the Climate Science for Service Partnership China, both of which are collaborative science initiatives between research institutions.
You might also like: How Robots Will Help the UN Reach Its Sustainability Goals