Part 1 - Calculate global mean temperature
Prepared by Mathias Hauser.
In the first part we will work with Near-Surface Air Temperature data (tas) from climate models. We will load global, gridded temperature fields and compute the global mean temperature.
We look at climate simulations from 1850 to 2100. The simulations are split into two parts: the first covers the historical period from 1850 to 2014 and the second the “future” from 2015 to 2100. The historical simulation uses historical data of greenhousegases (GHGs), aerosols etc.. The projections for the future climate use scenarios with potential pathways of the forcing agents, named shared socioeconomic pathways (SSP). Here, we will use a scenario with high radiative forcing by the end of century and thus a strong warming, named SSP5-8.5 (or ssp585). We will start by looking at two climate models.
Learning goals
programming goals
open netCDF files using xarray and manipulate them
write a python function that can be reused later
scientific and data analysis goals
using exploratory data analysis to explore the data structure and plot a subset of the data
understand the differences between the unweighted and weighed global means
see what temperature changes are projected for the next century for a high emisson scenario and how climate models can differ considerably
Preparation
Create a new notebook in the
code
folder and make sure you selectipp_analysis
as kernel. Rename the notebook fromUntitled.ipynb
top1_global_mean_name.ipynb
where you replacename
with your ETH username.Convert the first cell to Markdown and add a title, e.g.
# Global mean temperature
and add your name on a new line.Add a new cell and import the required packages. We will need
numpy
,matplotlib.pyplot
, andxarray
(use the standard abbreviations).
Note
For students if the ip python course: please submit the finished notebook on the first hand in date.
Read temperature data (ACCESS-CM2)
The first step in a scientific investigation is usually to load the data we want to analyse.
Here we read the data from the "ACCESS-CM2"
model. The data is located at
"../data/cmip6/tas"
. This is a file path relative to the location of the notebook
means ‘one folder upwards’.
Create a new cell and change it to a
Markdown
cell. Add## Read temperature data (ACCESS-CM2)
as title. I strongly suggest you do this for all sections.
Create a new code cell. Define the filename and open the historical data using xarray.
filename = "../data/cmip6/tas/tas_ann_ACCESS-CM2_historical_r1i1p1f1_g025.nc" hist = xr.open_dataset(filename)
Create a new code cell. Look at the representation of
hist
- i.e. writehist
in the new code cell and execute it. What variables -data_vars
- are on the file? What dimensions? What time period does the dataset cover?Select the first year using
hist.isel(...)
(remember: python is 0-based), select the variabletas
and create a plot using.plot()
.Create a new code cell (I won’t repeat this from now on). Now open the projected future temperature data (ssp585) of ACCESS-CM2.
filename = ... proj = ...
Again, take a look at
proj
- compare theyear
coordinates.Next we want to combine the
hist
andproj
datasets. One way to do this is to usexr.concat
to concatenate them along the year dimension (usexr.concat?
to get the docstring, or check the online documentation: xr.concat).ds1 = xr.concat(...)
Check the
year
coordinates ofds1
and make sure it runs from 1850 to 2100.
In this section we loaded data and had a first look at it: how are the variables named, what are the dimensions and how does it look on a plot - are there missing values. This allows us to understand how the data looks like. Such exploratory analysis is often the first step of any project working with data.
Calculate global mean (ACCESS-CM2)
We now want to find out how surface air temperature changes with time. For this we compute the global mean.
Again, create a markdown cell and add a meaningful section title.
Use an unweighted mean (
ds1.mean
) to calculate the average over the"lat"
and"lon"
dimensions and assign the result tods1_mean
.Calculate the area weighted mean (see the introduction how to do this) and assign it to
ds1_weighted
.Plot the time series of
ds1_mean.tas
andds1_weighted.tas
.Pass a
label
(e.g."weighted"
) and add aplt.legend()
to the plot.Which of the two has a higher temperature? Why?
The temperatures are in Kelvin and the weighted global mean temperature increases from around 287 K to 293 K. Does this mean we can expect a warming of 6°C at the end of the 21st century compared to the pre-industrial period? Not necessarily - we only looked at one model so far and they don’t agree on the warming. (Obviously it also depends on the emissions.)
Read temperature data (GFDL-ESM4)
To compare we will now read the temperature data of another climate model: "GFDL-ESM4"
.
Above we opened the netCDF containing the historical and future data
separately. Here we will test an alternative possibility, opening them both at once.
Define both filenames for the GFDL-ESM4 model in a list.
files = [ "file1.nc", "file2.nc", ]
Use
ds2 = xr.open_mfdataset(files)
to open the two files.
This might create an error. Do you understand what the error message is telling you?
The dask package is missing and we need to install it.
In Jupyter Lab go to
(or to ).Alternatively open a
Terminal
of theMiniforge prompt
.Use conda to activate the
ipp_analysis
environment.Use conda to install dask.
Try to open the file again. If this does not work you have to restart the notebook (
) and run all code cells again.Look at the representation of
ds2
- doestas
look different as fords1
?The data of
ds2
was loaded lazily as a dask array. We will not concern ourselves with this here. Callds2 = ds2.compute()
to convert
tas
into a numpy array.
Convert time axis
Look at the representation (abbreviated as “repr”) of
ds2
and make sure the time axis goes from 1850 to 2100.Compare
ds1.year
andds2.time
.The new dataset has a different time axis as
ds1
. The differences are shown in the table below:dataset
name
type
resolution
ds1
“year”
integer
annual
ds2
“time”
cftime.DatetimeNoLeap
monthly
For most of the exercises here we use the schema of
ds1
, i.e. the time is given in years as integers, e.g. 1850, 1851, …, 2100. Using integer makes it easier to work with the data. In our project we will only use annual mean (or maximum) values thus using integer years is enough. However, (most) real-life datasets will have datetime data as inds2
, e.g. it would be awkward to express monthly data as floats. In addition, climate models often have special calendars, such as no-leap (i.e. all years have 365 days). We will not look at this in depth here but point to the xarray docs on time series data and non-standard calendars.We will now convert the monthly data into annual means and at the same time go to integer time coordinates.
Calculate annual means using
ds2.groupby
, assign it tods2_annual
.What kind of time axis does
ds2_annual
have?
Calculate global mean (GFDL-ESM4)
Calculate area-weighted mean from
ds2_annual
, assign it tods2_weighted
.Plot
tas
ofds1_weighted
andds2_weighted
in the same plot.Pass a
label
(the model name) and add alegend
to the plot.How do the two climate models compare? Add a cell and note your findings. If you want you can explore more:
convert the temperatures from K to °C
compare the the difference between the models as a function of time. What does the difference for historical period imply?
calculate anomalies relative to a time period (e.g. 1850-1900)
Helper function
We have now calculated the weighed global mean twice. It’s only two
lines of code but as an exercise we will write a small function that
takes a xr.Dataset
or xr.DataArray
as input, calculates cosine
weights, and returns the weighed mean. As a reminder, python functions
are constructed as follows:
def add_numbers(a, b):
result = a + b
return result
Copy the function
add_numbers
to a code cell and execute the cell.Call the function with some numbers.
What happens if you remove the
return result
line and try to call it again? (Forgetting to add thereturn
statement is a common mistake.)Create a new function to calculate the global weighted mean:
The function should be named
global_mean
It should take a parameter named
ds
as input.Using
ds["lat"]
it should calculate the cosine weights.Then calculate the the weighted global mean and
return
it.Thus, you can start with the following:
def global_mean(ds): ...
Calculate the weighted global mean of
ds1
andds2_annual
again, using yourglobal_mean
function.
Move the helper function to a module
We defined the global_mean
function so we can reuse it and don’t
have to write the same code over and over again. However, we can
currently only use it in this notebook. If we want to use it again in
the next notebook we need to find a different solution. We
will copy the function to a python module - in the simplest case a
python module is just a text file with the ending py
.
Open the file
computation.py
. It should be in the same folder as the notebook.Copy the
global_mean
function and paste it in thecomputation.py
file.You have to add the required imports at the top of
computation.py
(i.e.import numpy as np
).Import the
computation.py
module in the notebook you are currently working on - to do so you need to leave thepy
ending away - i.e.,import computation
.Calculate the global mean of
ds1
andds2_annual
for a third time, now usingcomputation.global_mean
.If this does not work you have to restart the notebook (
) and run all code cells again.
What have you learned so far?
Reflect again on the typical steps taken for the data analysis. Note it at the end of the notebook.
First we ... Then ...
This concludes Part 1.