Part 1 - Calculate global mean temperature
==========================================

Prepared by *Mathias Hauser*.

In the first part we will work with *Near-Surface Air Temperature* data (*tas*) from
climate models. We will load global, gridded temperature fields and compute the global
mean temperature.

We look at climate simulations from 1850 to 2100. The simulations are split into two parts:
the first covers the historical period from 1850 to 2014 and the second the
"future" from 2015 to 2100. The historical simulation uses historical
data of greenhousegases (GHGs), aerosols etc.. The projections for the
future climate use scenarios with potential pathways of the forcing
agents, named *shared socioeconomic pathways* (SSP). Here, we will use a
scenario with high radiative forcing by the end of century and thus a
strong warming, named SSP5-8.5 (or ssp585). We will start by looking at two climate
models.

.. admonition:: Learning goals
   :class: important

   - **programming goals**

     - open netCDF files using xarray and manipulate them
     - write a python function that can be reused later

   - **scientific and data analysis goals**

     - using exploratory data analysis to examine the data structure and plot a subset of the data
     - understand the differences between weighed and unweighted global means
     - see what temperature changes are projected for the next century for a high emisson scenario and how climate models can differ considerably

Preparation
-----------

#. Create a new notebook in the `code` folder and make sure you select
   `ipp_analysis` as kernel. Rename the notebook from `Untitled.ipynb` to
   `p1_global_mean_name.ipynb` where you replace `name` with
   your ETH username.
#. Convert the first cell to Markdown and add a title,
   e.g. ``# Global mean temperature`` and add your name on a new line.
#. Add a new cell and import the required packages.

   .. code:: python

      import matplotlib.pyplot as plt
      import numpy as np
      import xarray as xr

.. note::

   For students if the ip python course: please submit the finished notebook on the
   first hand in date.


Read temperature data (ACCESS-CM2)
----------------------------------

The first step in a scientific investigation is usually to load the data we want to analyse.
Here we read the data from the `"ACCESS-CM2"` model. The data is located at
`"../data/cmip6/tas"`. This is a file path relative to the location of the notebook `".."`
means 'one folder upwards'.

#. Create a new cell and change it to a ``Markdown`` cell. Add

   ::

      ## Read temperature data (ACCESS-CM2)

   as title. I strongly suggest you do this for all sections.

#. Create a new code cell. Define the filename and open the historical
   data using xarray.

   .. code:: python

      filename = "../data/cmip6/tas/tas_ann_ACCESS-CM2_historical_r1i1p1f1_g025.nc"
      hist = xr.open_dataset(filename)

#. Create a new code cell (I won't repeat this from now on). Look at the representation
   of ``hist`` - i.e. write ``hist`` in the new code cell and execute it.

   - What variables (``data_vars``) are on the file?
   - What dimensions?
   - What time period does the dataset cover?

#. Select the first year using ``hist.isel(...)``, select
   the variable ``tas`` and create a plot using ``.plot()``.

#. Open the projected future temperature data (ssp585) of ACCESS-CM2.

   .. code:: python

      filename = ...
      proj = ...

#. Again, take a look at ``proj`` - compare the ``year`` coordinates.

#. Next we want to combine the ``hist`` and ``proj`` datasets. One way to do this is to
   use ``xr.concat`` to concatenate them along the year dimension (use ``xr.concat?``
   to get the docstring, or check the online documentation: `xr.concat
   <https://docs.xarray.dev/en/stable/generated/xarray.concat.html>`_).

   .. code:: python

      ds1 = xr.concat(...)

#. Check the ``year`` coordinates of ``ds1`` and make sure it runs from 1850 to 2100.

In this section we loaded data and had a first look at it: how are the variables
named, what are the dimensions and how does it look on a plot - are there missing values?
This allows us to understand how the data looks like. Such exploratory analysis is
often the first step of any project working with data.

Calculate global mean (ACCESS-CM2)
----------------------------------

We now want to find out how surface air temperature changes with time. For this we
compute the global mean.

#. Again, create a markdown cell and add a meaningful section title.
#. Use an unweighted mean (``ds1.mean``) to calculate the average over
   the ``"lat"`` and ``"lon"`` dimensions and assign the result to
   ``ds1_mean``.
#. Calculate the area weighted mean (see also the introduction):

   - use :py:func:`np.cos` and :py:func:`np.deg2rad` to calculate the weights
   - calculate the weighted mean using :py:meth:`ds1.weighted` and assign it to ``ds1_weighted``.
#. Plot the time series of ``ds1_mean.tas`` and ``ds1_weighted.tas``.
#. Pass a ``label`` (e.g. ``"weighted"``) and add a ``plt.legend()`` to the plot.
#. Which of the two has a higher temperature? Why?

The temperatures are in Kelvin and the weighted global mean temperature increases from around
287 K to 293 K. Does this mean we can expect a warming of 6°C at the end of the 21\ :sup:`st` century
compared to the pre-industrial period? Not necessarily - we only looked at one model so far
and they don't agree on the warming. (Obviously it also depends on the emissions.)

Read temperature data (GFDL-ESM4)
---------------------------------

To compare we will now read the temperature data of another climate model: `"GFDL-ESM4"`.
Above we opened the netCDF containing the historical and future data
separately. Here we will test an alternative possibility, opening them both at once.

#. Define both filenames for the GFDL-ESM4 model in a list.

   .. code:: python

      files = [
          "file1.nc",
          "file2.nc",
      ]

#. Use

   .. code:: python

      ds2 = xr.open_mfdataset(files)

   to open the two files.

#. The data of ``ds2`` was loaded lazily as a dask array. We will not concern ourselves
   with this here. Call

   .. code:: python

      ds2 = ds2.compute()

   to convert ``tas`` into a numpy array.


Convert time axis
-----------------

.. |ico1| image:: ../_static/icon-database.svg
   :width: 0.9em

#. Look at the representation (abbreviated as "repr") of ``ds2`` and make sure the time
   axis goes from 1850 to 2100.
#. Compare ``ds1.year`` and ``ds2.time``. You may have to click on the data symbol (|ico1|)
   to see the details.

   The new dataset has a different time axis as ``ds1``. The differences
   are shown in the table below:

   ======= ====== ===================== ==========
   dataset name   type                  resolution
   ======= ====== ===================== ==========
   ``ds1`` "year" integer               annual
   ``ds2`` "time" cftime.DatetimeNoLeap monthly
   ======= ====== ===================== ==========

   For most of the exercises here we use the scheme of ``ds1``, i.e. the time
   is given in years as integers, e.g. 1850, 1851, …, 2100. Using integers
   makes it easier to work with the data. In our project we will only use
   annual mean (or maxima) values thus using integer years is enough.
   However, (most) real-life datasets will have datetime data as in
   ``ds2``, e.g. it would be awkward to express monthly data as floats.
   In addition, climate models often have special calendars, such as
   no-leap (i.e. all years have 365 days). We will not look at this in
   depth here but point to the xarray docs on `time series
   data <https://docs.xarray.dev/en/stable/user-guide/time-series.html>`__
   and `non-standard calendars
   <https://docs.xarray.dev/en/stable/user-guide/weather-climate.html#non-standard-calendars-and-dates-outside-the-timestamp-valid-range>`__.

   We will now convert the monthly data into annual means and at the same
   time go to integer time coordinates.

#. Calculate annual means using ``ds2.groupby``, assign it to ``ds2_annual``.
#. What kind of time axis does ``ds2_annual`` have now?

Calculate global mean (GFDL-ESM4)
---------------------------------

#. Calculate area-weighted mean from ``ds2_annual``, assign it to ``ds2_weighted``.
#. Plot ``tas`` of ``ds1_weighted`` and ``ds2_weighted`` in the same
   plot.
#. Pass a ``label`` (the model name) and add a ``legend`` to the plot.
#. How do the two climate models compare? Add a cell and note your findings. If you want
   you can explore more:

   - convert the temperatures from K to °C
   - compare the the difference between the models as a function of time. What does the difference for historical period imply?
   - calculate anomalies relative to a time period (e.g. 1850--1900)

Helper function
---------------

We have now calculated the weighed global mean twice. It's only two
lines of code but as an exercise we will write a small function that
takes a ``xr.Dataset`` or ``xr.DataArray`` as input, calculates cosine
weights, and returns the weighed mean. Python functions
are constructed as follows:

.. code:: python

   def add_numbers(a, b):
       result = a + b

       return result

#. Copy the function ``add_numbers`` to a code cell and execute the
   cell.
#. Call the function with some numbers.
#. What happens if you remove the ``return result`` line and try to call
   it again? (Forgetting to add the ``return`` statement is a common
   mistake.)
#. Create a new function to calculate the global weighted mean:

   -  The function should be named ``global_mean``
   -  It should take a parameter named ``ds`` as input.
   -  Using ``ds["lat"]`` it should calculate the cosine weights.
   -  Then calculate the the weighted global mean and ``return`` it.
   -  Thus, you can start with the following:

      .. code:: python

         def global_mean(ds):
             ...

#. Calculate the weighted global mean of ``ds1`` and ``ds2_annual``
   again, using your ``global_mean`` function.

Move the helper function to a module
------------------------------------

We defined the ``global_mean`` function so we can reuse it and don't
have to write the same code over and over again. However, we can
currently only use it in this notebook. If we want to use it again in
the next notebook we need to find a different solution. We
will copy the function to a python module - in the simplest case a
python module is just a text file with the ending `.py`.

#. Open the file `computation.py`. It should be in the same folder as the notebook.
#. Copy the ``global_mean`` function and paste it in the
   `computation.py` file.
#. You have to add the required imports at the top of `computation.py` (i.e. ``import numpy as np``).
#. Import the `computation.py` module in the notebook you are currently working on - to
   do so you need to leave the `.py` ending away - i.e., ``import computation``.
#. Calculate the global mean of ``ds1`` and ``ds2_annual`` for a third
   time, now using ``computation.global_mean``.

   -  If this does not work you have to restart the notebook (:menuselection:`Kernel -->
      Restart Kernel`) and run all code cells again.


.. admonition:: What have you learned so far?
   :class: important

   Reflect again on the typical steps taken for the data analysis. Note it at the end of the notebook.

      ::

         First we ... Then ...

This concludes Part 1.