Part 1 - Calculate global mean temperature

Prepared by Mathias Hauser.

In the first part we will work with Near-Surface Air Temperature data (tas) from climate models. We will load global, gridded temperature fields and compute the global mean temperature.

We look at climate simulations from 1850 to 2100. The simulations are split into two parts: the first covers the historical period from 1850 to 2014 and the second the “future” from 2015 to 2100. The historical simulation uses historical data of greenhousegases (GHGs), aerosols etc.. The projections for the future climate use scenarios with potential pathways of the forcing agents, named shared socioeconomic pathways (SSP). Here, we will use a scenario with high radiative forcing by the end of century and thus a strong warming, named SSP5-8.5 (or ssp585). We will start by looking at two climate models.

Learning goals

  • programming goals

    • open netCDF files using xarray and manipulate them

    • write a python function that can be reused later

  • scientific and data analysis goals

    • using exploratory data analysis to explore the data structure and plot a subset of the data

    • understand the differences between the unweighted and weighed global means

    • see what temperature changes are projected for the next century for a high emisson scenario and how climate models can differ considerably

Preparation

  1. Create a new notebook in the code folder and make sure you select ipp_analysis as kernel. Rename the notebook from Untitled.ipynb to p1_global_mean_name.ipynb where you replace name with your ETH username.

  2. Convert the first cell to Markdown and add a title, e.g. # Global mean temperature and add your name on a new line.

  3. Add a new cell and import the required packages. We will need numpy, matplotlib.pyplot, and xarray (use the standard abbreviations).

Note

For students if the ip python course: please submit the finished notebook on the first hand in date.

Read temperature data (ACCESS-CM2)

The first step in a scientific investigation is usually to load the data we want to analyse. Here we read the data from the "ACCESS-CM2" model. The data is located at "../data/cmip6/tas". This is a file path relative to the location of the notebook means ‘one folder upwards’.

  1. Create a new cell and change it to a Markdown cell. Add

    ## Read temperature data (ACCESS-CM2)
    

    as title. I strongly suggest you do this for all sections.

  2. Create a new code cell. Define the filename and open the historical data using xarray.

    filename = "../data/cmip6/tas/tas_ann_ACCESS-CM2_historical_r1i1p1f1_g025.nc"
    hist = xr.open_dataset(filename)
    
  3. Create a new code cell. Look at the representation of hist - i.e. write hist in the new code cell and execute it. What variables - data_vars - are on the file? What dimensions? What time period does the dataset cover?

  4. Select the first year using hist.isel(...) (remember: python is 0-based), select the variable tas and create a plot using .plot().

  5. Create a new code cell (I won’t repeat this from now on). Now open the projected future temperature data (ssp585) of ACCESS-CM2.

    filename = ...
    proj = ...
    
  6. Again, take a look at proj - compare the year coordinates.

  7. Next we want to combine the hist and proj datasets. One way to do this is to use xr.concat to concatenate them along the year dimension (use xr.concat? to get the docstring, or check the online documentation: xr.concat).

    ds1 = xr.concat(...)
    
  8. Check the year coordinates of ds1 and make sure it runs from 1850 to 2100.

In this section we loaded data and had a first look at it: how are the variables named, what are the dimensions and how does it look on a plot - are there missing values. This allows us to understand how the data looks like. Such exploratory analysis is often the first step of any project working with data.

Calculate global mean (ACCESS-CM2)

We now want to find out how surface air temperature changes with time. For this we compute the global mean.

  1. Again, create a markdown cell and add a meaningful section title.

  2. Use an unweighted mean (ds1.mean) to calculate the average over the "lat" and "lon" dimensions and assign the result to ds1_mean.

  3. Calculate the area weighted mean (see the introduction how to do this) and assign it to ds1_weighted.

  4. Plot the time series of ds1_mean.tas and ds1_weighted.tas.

  5. Pass a label (e.g. "weighted") and add a plt.legend() to the plot.

  6. Which of the two has a higher temperature? Why?

The temperatures are in Kelvin and the weighted global mean temperature increases from around 287 K to 293 K. Does this mean we can expect a warming of 6°C at the end of the 21st century compared to the pre-industrial period? Not necessarily - we only looked at one model so far and they don’t agree on the warming. (Obviously it also depends on the emissions.)

Read temperature data (GFDL-ESM4)

To compare we will now read the temperature data of another climate model: "GFDL-ESM4". Above we opened the netCDF containing the historical and future data separately. Here we will test an alternative possibility, opening them both at once.

  1. Define both filenames for the GFDL-ESM4 model in a list.

    files = [
        "file1.nc",
        "file2.nc",
    ]
    
  2. Use

    ds2 = xr.open_mfdataset(files)
    

    to open the two files.

  3. This might create an error. Do you understand what the error message is telling you?

  4. The dask package is missing and we need to install it.

    • In Jupyter Lab go to File ‣ New ‣ Terminal (or to New Launcher ‣ Terminal).

    • Alternatively open a Terminal of the Miniforge prompt.

    • Use conda to activate the ipp_analysis environment.

    • Use conda to install dask.

  5. Try to open the file again. If this does not work you have to restart the notebook (Kernel ‣ Restart Kernel) and run all code cells again.

  6. Look at the representation of ds2 - does tas look different as for ds1?

  7. The data of ds2 was loaded lazily as a dask array. We will not concern ourselves with this here. Call

    ds2 = ds2.compute()
    

    to convert tas into a numpy array.

Convert time axis

  1. Look at the representation (abbreviated as “repr”) of ds2 and make sure the time axis goes from 1850 to 2100.

  2. Compare ds1.year and ds2.time.

    The new dataset has a different time axis as ds1. The differences are shown in the table below:

    dataset

    name

    type

    resolution

    ds1

    “year”

    integer

    annual

    ds2

    “time”

    cftime.DatetimeNoLeap

    monthly

    For most of the exercises here we use the schema of ds1, i.e. the time is given in years as integers, e.g. 1850, 1851, …, 2100. Using integer makes it easier to work with the data. In our project we will only use annual mean (or maximum) values thus using integer years is enough. However, (most) real-life datasets will have datetime data as in ds2, e.g. it would be awkward to express monthly data as floats. In addition, climate models often have special calendars, such as no-leap (i.e. all years have 365 days). We will not look at this in depth here but point to the xarray docs on time series data and non-standard calendars.

    We will now convert the monthly data into annual means and at the same time go to integer time coordinates.

  3. Calculate annual means using ds2.groupby, assign it to ds2_annual.

  4. What kind of time axis does ds2_annual have?

Calculate global mean (GFDL-ESM4)

  1. Calculate area-weighted mean from ds2_annual, assign it to ds2_weighted.

  2. Plot tas of ds1_weighted and ds2_weighted in the same plot.

  3. Pass a label (the model name) and add a legend to the plot.

  4. How do the two climate models compare? Add a cell and note your findings. If you want you can explore more:

    • convert the temperatures from K to °C

    • compare the the difference between the models as a function of time. What does the difference for historical period imply?

    • calculate anomalies relative to a time period (e.g. 1850-1900)

Helper function

We have now calculated the weighed global mean twice. It’s only two lines of code but as an exercise we will write a small function that takes a xr.Dataset or xr.DataArray as input, calculates cosine weights, and returns the weighed mean. As a reminder, python functions are constructed as follows:

def add_numbers(a, b):
    result = a + b

    return result
  1. Copy the function add_numbers to a code cell and execute the cell.

  2. Call the function with some numbers.

  3. What happens if you remove the return result line and try to call it again? (Forgetting to add the return statement is a common mistake.)

  4. Create a new function to calculate the global weighted mean:

    • The function should be named global_mean

    • It should take a parameter named ds as input.

    • Using ds["lat"] it should calculate the cosine weights.

    • Then calculate the the weighted global mean and return it.

    • Thus, you can start with the following:

      def global_mean(ds):
          ...
      
  5. Calculate the weighted global mean of ds1 and ds2_annual again, using your global_mean function.

Move the helper function to a module

We defined the global_mean function so we can reuse it and don’t have to write the same code over and over again. However, we can currently only use it in this notebook. If we want to use it again in the next notebook we need to find a different solution. We will copy the function to a python module - in the simplest case a python module is just a text file with the ending py.

  1. Open the file computation.py. It should be in the same folder as the notebook.

  2. Copy the global_mean function and paste it in the computation.py file.

  3. You have to add the required imports at the top of computation.py (i.e. import numpy as np).

  4. Import the computation.py module in the notebook you are currently working on - to do so you need to leave the py ending away - i.e., import computation.

  5. Calculate the global mean of ds1 and ds2_annual for a third time, now using computation.global_mean.

    • If this does not work you have to restart the notebook (Kernel ‣ Restart Kernel) and run all code cells again.

What have you learned so far?

Reflect again on the typical steps taken for the data analysis. Note it at the end of the notebook.

First we ... Then ...

This concludes Part 1.