Pre-defined projects and datasets

CliMAF knows a bunch of datasets . Package projects is devoted to that :

Package projects declares a number of ‘projects’, and the data location for these projects , at CNRM or on Ciclad, when they exists. All its modules are automatically loaded when importing climaf.api or launching by climaf

The concept of a ‘project’ in CliMAF is explained with function cproject(). It allows to declare non-standard variable names, scaling parameters…

Please note that, for some combinations of observation ‘projects’ and variables (i.e. ‘snm’ in erai, ‘pr’ in gpcp and cruts3), CliMAF provides a flux variable, while the original data provides a monthly accumulation. In that case, the conversion to rates assumes a fixed month length of 30.3 days (for ensuring minimal bias at year scale)

For listing the declared projects and their specifics, if you are under the Python prompt, type e.g.:

>>> import climaf
>>> dir(climaf.projects)
>>> help(climaf.projects.cmip5)

For knowing the specifics of variables for a given project (as e.g. re-scaling), type:

>>> from climaf.api import *
>>> aliases["erai"]

and interpret a result such as:

'erai': {'clt': ('tcc', 1.0, 0.0, None, 'TCC', None),
         'das': ('d2m', 1.0, 0.0, None, '2D', None),
...

by: in project ‘erai’, standard variable ‘clt’ is read from data variable ‘tcc’ with scaling=1, offest=0, and no change in units name; while ‘TCC’ is the variable name used in computing datafilename; and there is no special missing value in addition to the one duly declared in the datafile

cmip6

This module declares locations for searching data for CMIP6 outputs produced by libIGCM or Eclis for all frequencies.

Attributes for CMIP6 datasets are: model, experiment, table, realization, grid, version, institute, mip, root

Syntax for these attributes is described in the CMIP6 DRS document

Example for a CMIP6 dataset declaration

>>> tas1pc=ds(project='CMIP6', model='CNRM-CM6-1', experiment='1pctCO2', variable='tas', table='Amon',
...           realization='r3i1p1f2', period='1860-1861')

cmip5

This module declares locations for searching data for CMIP5 outputs produced by libIGCM or Eclis for all frequencies.

Attributes for CMIP5 datasets are: model, experiment, table, realization, grid, version, institute, mip, root

Syntax for these attributes is described in `the CMIP5 DRS document
<http://cmip-pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf>`

Example for a CMIP5 dataset declaration:

>>> tas1pc = ds(project='CMIP5', model='CNRM-CM6-1', experiment='1pctCO2', variable='tas', table='Amon',
...             realization='r3i1p1f2', period='1860-1861')

ocmip5

This module declares how to access OCMIP5 data on Ciclad.

Use attributes ‘model’ and ‘frequency’

Example of a path: /prodigfs/project/OCMIP5/OUTPUT/IPSL/IPSL-CM4/CTL/mon/CACO3/CACO3_IPSL_IPSL-CM4_CTL_1860-1869.nc

Example

>>> cdef('model','IPSL-CM4')
>>> cdef('frequency','monthly')
>>> cactl=ds(project='OCMIP5_Ciclad', simulation='CTL', variable='CACO3', period='1860-1861')

ref_climatos_and_ts

This module declares two ‘projects’:

  • ‘ref_climatos’, for the climatological annual cycles and
  • ‘ref_ts’, for the ‘time series’ (one variable evolving with time) of a set of reference products as managed by J. Servonnat at IPSL.

This archive is available on Ciclad (IPSL), Curie (TGCC) and Ada (IDRIS), and /cnrm and at Cerfacs

The specific attributes are:

  • product (default:’*’): name of the observation or reanalysis product (example: ERAI, GPCP…)
  • for climatologies only : clim_period : a character string; there is no mechanism of period selection (like with ‘period’)

Default values of the attributes for climatologies (ref_climato):

  • product : ‘*’
  • variable : ‘*’
  • period : ‘fx’
  • frequency : annual_cycle’

It is possible to pass a list of products to ‘product’ to define an ensemble of climatologies with eds() as in:

>>> dat_ens = eds(project='ref_climatos', product=['ERAI','NCEP'],...)

Default values of the attributes for time_series (ref_ts):

  • product : ‘*’
  • period : ‘1900-2050’
  • frequency : ‘monthly’

Example of a ‘ref_ts’ project dataset declaration

>>> cdef('project','ref_ts')
>>> d=ds(variable='tas',period='198001'....)

igcm_out

This module declares locations for searching data for IGCM outputs produced by libIGCM for all frequencies, on Ciclad and at TGCC.

The project IGCM_OUT presents many possible keywords (facets) to determine precisely the dataset and render the data location as efficient as possible. We have chosen to provide ‘wild cards’ (*) to many keywords by default. This way, ds() has a greater chance to feed the user back with a result (even if it contains too many simulations), even if the user specifies just a few keywords.

Three projects are available to access the IGCM_OUT outputs; they are aimed at dealing with the diversity of variable names seen among the IPSL outputs (that can vary with time and users). They all provide aliases to the CMIP variables names and to the old names (taking advantage of the mechanisms linked with calias). - IGCM_OUT corresponds to the more up-to-date combination of variable names (mix of CMIP and old names) - IGCM_OUT_old : links with the old variable names - IGCM_OUT_CMIP : simply uses calias to provide the scale, offset and filenameVar

The attributes are:
  • root : path (without the login) to the top of the IGCM_OUT tree
  • login : login of the producer of the simulation
  • model : explicit
  • experiment : piControl, historical, amip…
  • status : DEVT, PROD, TEST
  • simulation : name of the numerical simulation (JobName in the IGCM syntax)
  • DIR : ATM, OCE, SRF…
  • OUT : Analyse, Output
  • frequency : monthly, daily, annual_cycle (equivalent to ‘seasonal’)
  • ave_length : MO, DA (optionnal, but can reduce the duration of the localization by ds() )
  • period : explicit
  • variable : explicit
  • clim_period : a character string; there is no mechanism of period selection (like with ‘period’)
  • clim_period_length : can be set to ‘_50Y’ or ‘_100Y’ to access the annual cycles averaged over 50yr long or 100yr
    long periods
Default values of the attributes:
  • root : ‘/ccc/store/cont003/dsm’ (at TGCC)
  • login : ‘*’
  • model : ‘*’
  • experiment : ‘*’
  • status : ‘*’
  • simulation : ‘*’
  • DIR : ‘*’
  • OUT : ‘*’
  • frequency : ‘monthly’
  • ave_length : ‘*’
  • period : ‘fx’
  • variable : ‘*’
  • clim_period : ‘????_????’
  • clim_period_length : ‘*’

Example 1: - On Curie, access to a ‘time series’ dataset of the variable tas, providing values to all facets:

>>> dat1 = ds(project='IGCM_OUT',
              root='/ccc/store/cont003/dsm',
              login ='p86mart',
              model='IPSLCM6',
              experiment='piControl',
              status='DEVT',
              simulation='O1T09V04',
              DIR='ATM',
              OUT='Analyse',
              frequency='monthly',
              ave_length='MO',
              period='1850-1900',
              variable='tas'
              )
Note that the following request returns the same files (but takes more time):
>>> dat1 = ds(project='IGCM_OUT',
              model='IPSLCM6',
              simulation='O1T09V04',
              period='1850-1900',
              variable='tas'
              )

Example 2: - On Curie, access to a ‘SE_50Y’ dataset of the variable tas, providing values to all facets; Note that we set frequency to ‘seasonal’ (or ‘annual_cycle’), specify clim_period and clim_period_length (to specify either _50Y or _100Y)

>>> dat2 = ds(project='IGCM_OUT',
              login ='p86mart',
              model='IPSLCM6',
              experiment='piControl',
              status='DEVT',
              simulation='O1T09V04',
              DIR='ATM',
              OUT='Analyse',
              frequency='seasonal',
              clim_period='1850_1899',
              clim_period_length='_50Y',
              variable='tas'
              )

The attributes ‘model’, ‘simulation’ and ‘clim_period’ can be used to define ensembles with eds(). Example 3: - On Curie, define an ensemble with simulations ‘O1T09V01’,’O1T09V02’,’O1T09V03’:

>>> dat_ens = eds(project='IGCM_OUT',
                  model='IPSLCM6',
                  simulation=['O1T09V01','O1T09V02','O1T09V03'],
                  clim_period='1850_1859',
                  variable='tas'
                  )

Contact: jerome.servonnat@lsce.ipsl.fr

em

This module declares project em, base on data organization ‘generic’

EM (Experiment Manager) is a tool used at CNRM for moving simulation post-processed data from the HPSS to the local filesystem, and to organize it in a file hierarchy governed by a few configuration files

Simulation names (or ‘EXPIDs’) are assumed to be unique in the namespace defined by the user’s configuration file, which may include shared simulation

Specific facets are:
  • root : root directory for private data files as declared to EM
  • group : group of the simualtion (as declared to ECLIS)
  • frequency : for now, only monthly is managed; it is the default
  • realm : to speed up data search, and to resolve ambiguities. Usable values are ‘A, Atmos, O, Ocean, I, SeaIce, L, Land. Unfortunately, for now, you have to know whether you data is on a private dir (use e.g. ‘A’) or a shared one (use e.g. Atmos). Default is ‘*’ (costly).

Examples for defining an EM dataset:

>>> tas= ds(project='em', simulation='GSAGNS1', variable='tas', period='1975-1976', realm="(A|Atmos)")
>>> pr = ds(project='em', simulation="C1P60", group="SC", variable="pr", period="1850", realm="(O|Ocean)"))

See other examples in examples/data_em.py

The location of ocean variables in the various grid_XX files matches the case with : T_table_2.2, T_table_2.5, T_table_2.7, U_table_2.3, U_table_2.8, W_table2.3 … Other cases should be described by another ‘project’

WARNING REGARDING OCEAN DATA : for a number of old simulations, there is an issue with the name of time coordinates, which lead to some nav_lat/nav_lon coordinates being discarded during CDO processing. You can tell CLiMAF to deal automatically with that, at the expense of computing time, by setting and exporting environment variable CLIMAF_FIX_NEMO_TIME to any value except ‘no’, ‘0’ and ‘None’ BEFORE launching CliMAF. What CliMAF does in that case shows in ../scripts/mcdo.py (see function nemo_timefix())

A number of Seaice fields are duly described with 1.e+20 as missing value (which is ill described in data files); see code for details

example

This module declares project example and its data location for the standard CliMAF distro

Only one additionnal attribute: frequency (but data sample actually includes only frequency= ‘monthly’)

Example of an ‘example’ dataset definition

>>> dg=ds(project='example', simulation='AMIPV6ALB2G', variable='tas', period='1980-1981', frequency='monthly')

erai

This module declares ERA Interim data organization and specifics, as managed by Sophie T. at CNRM; see file:///cnrm/amacs/DATA/OBS/netcdf/

Also declares how to derive CMIP5 variables from the original ERAI variables set (aliasing)

Attributes are ‘grid’, and ‘frequency’

Various grids are available. Original grid writes as: grid=’_’. Other grids write e.g. as : grid =’T42’ or grid =’T127’

Example of an ‘erai’ project dataset declaration

>>> cdef('project','erai')
>>> d=ds(variable='tas',period='198001',grid='_', frequency='monthly')
>>> d2=ds(variable='tas',period='198001',grid='T42',frequency='daily')

erai-land

This module declares ERA Interim land data organization and specifics, as managed by Sophie T. at CNRM; see file:///cnrm/amacs/DATA/OBS/netcdf/

Also declares how to derive CMIP5 variables from the original ERAI-land variables set

Attribute is ‘grid’

Various grids are available. Original grid writes as : grid=’_’. Other grids write e.g. as : grid =’T127’

Most variables for ERAI-LAND have no CMIP5 counterpart : only CIMP5 ‘snd’ is aliased to ERAI-LAND ‘sd’; see doc for the other, original, ERAI-LAND variables

Example of an ‘erai_land’ project dataset declaration

>>> cdef('project','erai-land')
>>> d=ds(variable='snd',period='198001',grid='_')
>>> d2=ds(variable='snd',period='198001',grid='T127')

ceres

This module declares CERES data organization and specifics, as managed by Sophie T. at CNRM; see file:///cnrm/amacs/DATA/OBS/netcdf/

No attributes in addition to standard ones; and ‘simulation’ is not used

Version of dataset is implicitly the latest, through symbolic links managed by Sophie. Please complain to climaf at cnrm dot fr if this does not fit the needs

Example of a ‘ceres’ project dataset declaration

>>> d = ds(project='ceres', variable='rlds', period='198001', domain=[40.,60.,-10.,+20.])

cruts3

This module declares CRUTS3 data organization and specifics, as managed by Sophie T. at CNRM; see file:///cnrm/amacs/DATA/OBS/netcdf/

Also declares how to derive CMIP5 variables from the original CRUTS3 variables set

Attributes are ‘grid’

Various grids are available. Original grid writes as : grid=’’. Other grids write e.g. as : grid =’T127’

Example of an ‘cruts3’ project dataset declaration

>>> cdef('project','cruts3')
>>> d=ds(variable='tas',period='198001',grid='')
>>> d2=ds(variable='tas',period='198001',grid='T127')

gpcc

This module declares GPCC data organization and specifics, as managed by Sophie T. at CNRM; see file:///cnrm/amacs/DATA/OBS/netcdf/

Also declares how to derive CMIP5 variables from the original GPCC variables set

Attributes are ‘grid’

Various grids are available. Grids write e.g. as: grid=’05d’, grid =’1d’ and grid =’T127’

Example of an ‘gpcc’ project dataset declaration

>>> cdef('project','gpcc')
>>> d=ds(variable='pr',period='198001',grid='05d')
>>> d2=ds(variable='pr',period='198001',grid='1d')
>>> d3=ds(variable='pr',period='198001',grid='T127')

gpcp

This module declares GPCP data organization and specifics, as managed by Sophie T. at CNRM; see file:///cnrm/amacs/DATA/OBS/netcdf/

Also declares how to derive CMIP5 variables from the original GPCP variables set (aliasing/scaling)

Attributes are ‘grid’, and ‘frequency’.

Various grids are available. Grids write e.g. as: grid=’1d’, grid =’2.5d’, grid =’T42’ and grid =’T127’

Only two variables are available: the original ‘precip’ (mm/day) and pr (kg m-2 s-1)

Example of an ‘gpcp’ project dataset declaration

>>> cdef('project','gpcp')
>>> d=ds(variable='pr',period='198001',grid='2.5d', frequency='monthly')
>>> d2=ds(variable='pr',period='198001',grid='1d',frequency='daily')

obs4mips

This module declares locations for searching data for project OBS4MIP at CNRM (VDR), for all frequencies; see file:///cnrm/amacs/DATA/Obs4MIPs/doc/

Additional attribute for OBS4MIPS datasets : ‘frequency’

Example for an OBS4MIPS CMIP5 dataset declaration

>>> pr_obs=ds(project='OBS4MIPS', variable='pr', simulation='GPCP-SG', frequency='monthly', period='1979-1980')

cami

This module declares how to access observation datasets organized ‘a la CAMI’ at CNRM, at /cnrm/est/COMMON/cami/V1.8/climlinks/

Example

>>> pr_gpcp=ds(project='CAMIOBS', simulation='GPCP2.5d', variable='pr', period='1979-1980')