Tutorial 2a: Running a generic single-point case#

This tutorial is an introduction to running single-point simulations of the Community Terrestrial Systems Model (CTSM) at locations that do not have preconfigured options. It will guide you through setting up the required driver data (i.e. surface and atmosphere data) as well as setting up and submitting a single-point CTSM case.

In the previous tutorial, Day1a_GlobalCase, we set up and ran a global CTSM case. Many of the steps required to run a single-point case are similar, with some changes and additional required steps which we will cover here.

Questions about this tutorial?#


In this tutorial#

The tutorial has several components. Below you will find steps to:

  1. Generate subset surface and atmosphere data files at a single latitude and longitude point.

  2. Set up and submit a single-point case.

Specifically, we will simulate the Harvard Forest site for one year using data extracted from global datasets that are available for CTSM.

Note: This tutorial assumes you have completed the previous tutorials! Content in the Day 0a Git Started tutorial is required, and Day 0b NEON and Day 1 global tutorials are strongly recommended. If you haven't completed the Day 0 and Day 1 tutorials, go back and do these first.

TIP: Before we get started, make sure you’re in a bash kernel

  • Switch kernel (upper right of your current notebook)
  • Select either one of the Bash Kernels from the pop-up window
  • Click select

1. Subset global surface and atmosphere files

CTSM uses a surface data file to read in important grid cell-level information like vegetation, crop, and glacier grid cell fractions, the fractional cover of each plant functional type (PFT), and soil characteristics.

A global surface data file is located and read by default for global CTSM cases, depending on the chosen component set and resolution. To run CTSM at a single point, we will need to supply a surface data file at a specified latitude and longitude.

Remember from Day 1 that a component set, or colloquially a "compset", specifies a configuration for your case, including the component models, time period of simulation, and model physics options. The resolution defines the model resolution or grid size.

Similarly, when running a land-only simulation, that is when using a “data atmosphere model” (e.g., in DATM mode), with climate data (e.g. temperature, precipitation, solar radiation, etc.) driven by an input file, CTSM needs DATM files. We can also provide subset global DATM for single-point runs.

Tip: It is not strictly required to provide CTSM/CIME with subset DATM data in order to run a single-point case, as CTSM can just use the global files. However, your simulations will run much faster if you use subset climate data.

1.1 Use subset_data to subset surface and DATM files

We have created a python script, subset_data, which will subset default global surface and DATM files at a user-specified latitude and longitude.

This script is located in the CTSM source code, in the tools/site_and_regional folder.

Navigate here now:

cd ~/CTSM/tools/site_and_regional
ls

In order to use the subset_data python script you must have some required python packages installed. On NCAR machines (like Cheyenne), you can load the NCAR python library, ncar pylib, by running module load python and then ncar_pylib. This is not necessary in the cloud because your python environment is already configured in CESM-Lab.

Tip: On NCAR or other machines, you can also use your own python environment if you want. Required third-party python packages are scipy, xarray, and numpy.

You can use the built-in print help to see what options are available for the subset data script:

./subset_data --help

There are a lot of options, but for now we will just use a few of the most commonly used:#

Type of subsetting:
point : this tells the script to subset data at a single point (region is the other option)

Location-related information:
--lat : this tells the script which latitude to subset at (must be between -90 and 90)
--lon : this tells the script which longitude to subset at (can be between 0 and 360 or -180 and 180)
--site : optional, specifies a site name or tag

Type of files to create:
--create-surface : tells the script to subset surface data
--create-datm : tells the script to subset DATM data

Time information:
--datm-syr and --datm-eyr: starting and ending years for the DATM data to subset (must be between 1901 and 2014)

Data management information:
--create-user-mods : tells the script to create a user_mods directory (see below). Note that if you don’t use this option, you will have to modify scripts in your simulation to point to the modifed files.
--outdir : specifies the directory to place subset data and user mods directory in

Call the script with these options by running the line of code below#

A few notes:

  • We will use a latitude of 42.53562 and longitude of 287.82438 (i.e. Harvard Forest). Note that your latitude and longitude points do not have to be this precise for your own sites!

  • We will run the simulation from 2001 to 2002. Note that because we are only simulating years around present day, we are not using land use change. If you want to run a transient simulation, you will also need to create land use data (the --create-landuse option).

  • This is also pretty time consuming, so we’ll use qcmd_serial here to put this in the queue on a single processor.

  • As before, please be patient while this runs, and don’t worry if you see WARNING: No dominant pft type is chosen.

qcmd_serial -- ./subset_data point --lat 42.53562 --lon 287.82438 --site my_point --create-surface --create-datm --datm-syr 2001 --datm-eyr 2002 --create-user-mods --outdir /scratch/$USER/my_subset_data

echo "------------------------"
echo "Successfully subset data"

Depending on the speed of your computing system, it may take a bit of time to subset all the climate data.

1.2 Check on the subset files

Once the subsetting has successfully finished, let’s navigate to the specified output directory to check on the data that we just created:

cd /scratch/$USER/my_subset_data
ls

You should see a surface data file (e.g. surfdata_0.9x1.25 … .nc) and two folders: datmdata and user_mods.

  • datmdata houses the subset DATM files

  • user_mods is a directory created that houses several files we will use to set up our single-point case

Let’s navigate into the user_mods directory to look at the contents:

cd user_mods
ls

You should see three files: shell_commands, user_nl_clm, and user_nl_datm_streams.

The shell_commands file contains xmlchange commands required to set up a single point case at the specified latitude and longitude.

Take a look at this file if you want:

cat shell_commands

Note that many of the xml commands are changing aspects of the model configuration that are communicated to CIME (Common Infrastructure for Modeling the Earth), which is the infrastructure that generates model executables and associated input files. Below are explanations of the commands included in this script.

./xmlchange CLM_USRDAT_DIR - this tells CIME the location of an argument CLM_USRDAT_DIR which we can use to specify the main directory of subset data files

./xmlchange PTS_LON and ./xmlchange PTS_LAT - this tells CIME that we are running at a specified latitude and longitude

./xmlchange MPILIB - this specifies a specific MPI (Message Passing Interface) library to use required for single-point runs on NCAR machines.

If you remember from the Day 1 tutorial, user_nl_clm is a Fortran namelist file used to set up different namelist options for CLM. Here, we are using it to specify the location of our subset surface data. Note the use of the variable $CLM_USRDAT_DIR set up in the shell_commands file.

Similarly, user_nl_datm_streams specifies the location and a few other options for our subset DATM data.

We will use this user_mods directory when we create our single-point case (see below).

Note: If for whatever reason you end up moving the subset data directory (i.e. here /scratch/$USER/my_subset_data), you will need to modify the xmlchange command that specifies the CLM_USRDAT_DIR to be the full path to the directory's new location.

2. Create a single-point CTSM case

Now that we have our subset data ready to go, we can set up our single-point case with CIME.

The steps required here are very similar to the global case that we set up in the Day 1 tutorial, with a few differences. Mainly, we are going to specify a --user-mods-dir in our ./create_newcase command as the full path to the user_mods folder we just created with subset_data to point to all the data required to run the single point (rather than global) simulation. We will also choose a different component set and resolution.

2.1. Create the case

As in our Day 1 Tutorial, we will navigate into the CTSM scripts directory to run the create_newcase script. Today we are going to be running a CLM-BGC simulation.

cd ~/CTSM/cime/scripts
./create_newcase --case ~/clm_tutorial_cases/I2000_CTSM_singlept --res CLM_USRDAT --compset I2000Clm51BgcCrop --run-unsupported --user-mods-dirs /scratch/$USER/my_subset_data/user_mods

This command should look fairly familiar to you, with some updated values and arguments.

--res - defines the model resolution, or grid:

  • we are now using CLM_USRDAT, which should be used when we have user-specified domain (i.e. a subset surface data)

--compset - defines the component set for the case:

  • I2000Clm51BgcCrop is an alias that describes using year 2000 initialization time, data-driven atmosphere (GSWP3v1 data), CLM 5.1 BGC with prognostic crop, along with some other component settings.

--user-mods-dirs - this is where we tell CIME where our user mods directory is.

  • it should be the path to the directory that was created during our subset_data scripting

  • the namelist files (i.e. user_nl_clm and user_nl_datm_streams) will be copied into the case directory, and the commands within shell_commands will be executed. Remember that the information included in these files ensures the model uses the data subset for our site.

Tip: Depending on your machine (e.g. this is required on Cheyenne), you may also have to provide a project id (--project {PROJECT_ID}) which specifies accounting or directory permissions when on a batch system. By default the script uses the shell environment variable $PROJECT, which can be set in your bash profile

Remember that you can see script parameters and definitions using ./create_newcase –help

2.2. Set up the case and build the executable

As with our global case, we will change into our case directory, set up our case (./case.setup) and then build the executable with qcmd (qcmd -- ./case.build):

cd ~/clm_tutorial_cases/I2000_CTSM_singlept
./case.setup
./xmlchange MPILIB=impi
qcmd -- ./case.build
Note: You'll notice we changed the MPILIB parameter from what was set in the user_mods. This is a special change we have to make to run on this cloud system. On NCAR machines (e.g. Cheyenne) you should keep MPILIB=mpi-serial.

You can read on, but before executing any code blocks in the notebook wait for the model to build. This can take a while, especially while you’re wating for your qcmd job to start and as code for the land model compiles.

You’ll see text stating MODEL BUILD HAS FINISHED SUCCESSFULLY when it’s finished.

2.3. Customize the case

As in our global case, we will invoke a few XML commands to update some runtime values:

./xmlchange STOP_OPTION=nyears
./xmlchange STOP_N=1
./xmlchange RUN_STARTDATE='2001-01-01'
./xmlchange DATM_YR_ALIGN=2001
./xmlchange DATM_YR_START=2001
./xmlchange DATM_YR_END=2002
./xmlchange PIO_REARRANGER_LND=2

Spinup

When running a model like CLM, the initial conditions (i.e. state variables like carbon and nitrogen pools and soil moisture) have an impact on the results of the simulation. Often, we don’t know the precise values of these initial conditions. To get around this issue, we can initialize the model with arbitrary values and then run the model with some cycle of atmospheric forcing for many years (e.g. 200) until the model attains an equilibrium state. Then, we can simulate the model response to some perturbation (e.g. changing climate, CO2, etc.). This process – establishing an equilibrium state – is called spinup.

We have already run such a spinup simulation at Harvard Forest. We can see how the soil C pools evolved for this simulation over time.

Note that you’ll see AD mode for accelerated decomposition mode and Post-AD mode on this figure. See the tip below or visit the Model Equilibrium and its Acceleration section of the CLM Tech Note for more information

Evolution of different C pools during the accelerated decomposition spinup.
The steady-state, or equilibrium, size of carbon (C) and nitrogen (N) pools are proportional to their turnover time. This spinup simulation was conducted using "accelerated decomposition", or "AD" mode. This accelerates the turnover time of "slow" ecosystem C and N pools (soil, wood, and coarse woody debris) so they come into equilibrium more quickly.

We ran the model in AD mode for 100 years cycling through atmospheric forcing for 1981 to 2000. AD mode was invoked with the commands: ./xmlchange CLM_FORCE_COLDSTART=on and ./xmlchange CLM_ACCELERATED_SPINUP=on.

We then ran a “post-AD” simulation (using the end of our “AD” simulation as the starting point) for another 100 years with ./xmlchange CLM_ACCELERATED_SPINUP=off. In returning the turnover times of slow C and N pools to their intended rates, we have to adjust the pool sizes from their “AD” steady-state. For example if the turnover of “passive” soil C was 10x faster in AD mode, the passive soil C pool needs to be 10x larger starting the post-AD simulation (the model automatically handles this conversion for you). Running the simulation for 100 years in post-AD mode allows the state variables to equilibrate with non-accelerated decomposition. In post-AD mode the history files are monthly, whereas AD output is set to annual averages by default. This difference in history file output frequency is reflected in the variability in the post-AD output.

After spin up is complete, we have to tell CIME to use the spinup simulation’s end point as the starting point, or initial conditions, for our simulation.

We do this via the user_nl_clm file:

echo "finidat='/scratch/data/day2/finidat_file/I2000_CTSM51_spinup.clm2.r.0281-01-01-00000.nc'" >> user_nl_clm
Tip: You can also edit the file using any text editing software (e.g. vi, emacs, etc.).

Let’s check the file to make sure our command worked:

cat user_nl_clm

2.4. Submit the case

Finally, let’s submit the case as we did in Day 1:

./case.submit

You should see a confirmation that it successfully submitted.

Congratulations! You’ve created and submtted a single-point CLM case!#

You can check the status of your case as in previous tutorials.

qstat -u $USER

Once your jobs are complete (or show the ‘C’ state under the ‘Use’ column, which means complete), we can check the CaseStatus file to ensure there were no errors and it submitted, ran, and completed successfully. To do this, we’ll ‘tail’ the end of the CaseStatus file:

tail ~/clm_tutorial_cases/I2000_CTSM_singlept/CaseStatus

Remember from Day1a that you can check on files in your runs in the scratch/{USER}/{CASE_NAME}/run directory (e.g. /scratch/$USER/I2000_CTSM_singlept/run).

Archived history files will be in the scratch/{USER}/archive/{CASE_NAME}/lnd/hist directory (e.g. /scratch/$USER/archive/I2000_CTSM_singlept/lnd/hist).


Next, check out the next Generic Single Point Tutorial 2b_GenericSinglePoint_Visualization to walk through how to visualize and analyze some of the output produced. Note that you don’t need to wait for this job to finish before moving on to the Day2b tutorial