Functions for data definition and access¶
Except for the first three paragraphs, this section is for advanced use. As a first step, you should consider using
the built-in data data definitions described at projects.
You may need to come back to this section for reference
ds : define a dataset object (actually a front-end for cdataset)¶
- climaf.classes.ds(*args, **kwargs)[source]¶
Returns a dataset from its full Climate Reference Syntax string. Example
>>> ds('CMIP5.historical.pr.[1980].global.monthly.CNRM-CM5.r1i1p1.mon.Amon.atmos.last')
Also a shortcut for
cdataset(), when used with with only keywords arguments. Example>>> ds(project='CMIP5', model='CNRM-CM5', experiment='historical', frequency='monthly', simulation='r2i3p9', domain=[40,60,-10,20], variable='tas', period='1980-1989', version='last')
In that latter case, you may use e.g. period=’last_50y’ to get the last 50 years (or less) of data; but this will work only if no dataset’s attribute is ambiguous. ‘first_50y’ also works, similarly; and also period=’*’.
You must refer to doc at :
cdataset()
cdataset : define a dataset object¶
- class climaf.classes.cdataset(**kwargs)[source]¶
Create a CLIMAF dataset.
A CLIMAF dataset is a description of what the data (rather than the data itself or a file). It is basically a set of pairs attribute-value. The list of attributes actually used to describe a dataset is defined by the project it refers to.
To display the attributes you may use for a given project, type e.g.:
>>> cprojects["CMIP5"]
For further details on projects , see
cprojectNone of the project’s attributes are mandatory arguments, because all attributes defaults to the value set by
cdef()(which also applies if providing a None value for an attribute)Some attributes have a special format or processing :
period : see
init_period(). See also functionclimaf.classes.ds()for added flexibility in defining periods as last of first set of years among available datadomain : allowed values are either ‘global’ or a list for latlon corners ordered as in : [ latmin, latmax, lonmin, lonmax ]
variable : name of the geophysical variable ; this should be :
check : optional argument that drives the check of the period covered by the datafiles w.r.t. the period defined for the dataset; allowed values are True, False and “if_found”; the latter means : do check except if there is no data file for the dataset; this is intended for cases where datafiles are not (no more) accessible while the user expect to get processed data from the cache. Default value is env.environment.data_check. An error is raised if check fails.
check_type : defines the extent of period check; default value is env.environment.period_check_type; allowed values are :
‘none’ : don’t check period
‘light’ : checks that the period indicated by dates in data filenames includes dataset’s period (see method
light_check())‘medium’ : checks that the period covered by data in files includes dataset’s period (see method
check())‘full’ : in addition to case ‘period’, also checks for gaps in data, and for frequency (see method
check())
If check is True (or “if_found”, and some datafile exists), an error is raised if the check fails.
in project CMIP5 , for triplets (frequency, simulation, period, table ) : if any is ‘fx’ (or ‘r0i0p0 for simulation), the others are forced to ‘fx’ (resp. ‘r0i0p0’) too.
Example, using no default value, and adressing some CMIP5 data
>>> cdataset(project='CMIP5', model='CNRM-CM5', experiment='historical', frequency='monthly', >>> simulation='r2i3p9', domain=[40,60,-10,20], variable='tas', period='1980-1989', version='last')
You may use wildcard (‘*’) in attribute values, and use
explore()for having CliMAF doing something sensible matching such attributes with available data
cdataset.explore: explore data and periods, and match wildcard attributes¶
- cdataset.explore(option='check_and_store', group_periods_on=None, operation='intersection', first=None)[source]¶
Versatile datafile exploration for a dataset which possibly has wildcards (* and ? ) in attributes.
optioncan be :‘choices’ for returning a dict which keys are wildcard attributes and entries are values list
‘resolve’ for returning a NEW DATASET with instanciated attributes (if uniquely)
‘ensemble’ for returning AN ENSEMBLE based on multiple possible values of one or more attributes (tell which one is first in labels by using arg ‘first’)
‘check_and_store’ (or missing) for just identifying and storing dataset files list (while ensuring non-ambiguity check for wildcard attributes)
This feature works only for projects which organization is of type ‘generic’
See further below, after the first examples, what can done with wildcard on ‘period’
Toy example
>>> rst=ds(project="example", simulation="*", variable="rst", period="1980-1981") >>> rst ds('example|*|rst|1980-1981|global|monthly') >>> rst.explore('choices') {'simulation': ['AMIPV6ALB2G']} >>> instanciated_dataset=rst.explore('resolve') >>> instanciated_dataset ds('example|AMIPV6ALB2G|rst|1980-1981|global|monthly') >>> my_ensemble=rst.explore('ensemble') error : "Creating an ensemble does not make sense because all wildcard attributes have a single possible value ({'simulation': ['AMIPV6ALB2G']})"
Real life example for options
choicesandensemble>>> rst=ds(project="CMIP6", model='*', experiment="*ontrol*", realization="r1i1p1f*", table="Amon", ... variable="rsut", period="1980-1981") >>> clog('info') >>> rst.explore('choices') info : Attribute institute has matching value CNRM-CERFACS info : Attribute experiment has multiple values : set(['piClim-control', 'piControl']) info : Attribute grid has matching value gr info : Attribute realization has matching value r1i1p1f2 info : Attribute mip has multiple values : set(['CMIP', 'RFMIP']) info : Attribute model has multiple values : set(['CNRM-ESM2-1', 'CNRM-CM6-1']) {'institute': ['CNRM-CERFACS'], 'experiment': ['piClim-control', 'piControl'], 'grid': ['gr'], 'realization': ['r1i1p1f2'], 'mip': ['CMIP', 'RFMIP'], 'model': ['CNRM-ESM2-1', 'CNRM-CM6-1']} >>> # Let us further select by setting experiment=piControl >>> mrst=ds(project="CMIP6", model='*', experiment="piControl", realization="r1i1p1f*", table="Amon", ... variable="rsut", period="1980-1981") >>> mrst.explore('choices') {'institute': ['CNRM-CERFACS'], 'mip': ['CMIP'], 'model': ['CNRM-ESM2-1', 'CNRM-CM6-1'], 'grid': ['gr'], 'realization': ['r1i1p1f2']} >>> small_ensemble=mrst.explore('ensemble') >>> small_ensemble cens({ 'CNRM-ESM2-1':ds('CMIP6%%rsut%1980-1981%global%/cnrm/cmip%CNRM-ESM2-1%CNRM-CERFACS%CMIP%Amon%piControl%' 'r1i1p1f2%gr%latest'), 'CNRM-CM6-1' :ds('CMIP6%%rsut%1980-1981%global%/cnrm/cmip%CNRM-CM6-1%CNRM-CERFACS%CMIP%Amon%piControl%' 'r1i1p1f2%gr%latest') })
When option=’choices’ and period= ‘*’, the period of all matching files will be either :
aggregated among all instances of all attributes with wildcards (default)
or, if argument
group_periods_onprovides an attribute name, aggregated after being sorted on that attribute and merged
The aggregation is governed by argument
operation, which can be either :‘intersection’ : which is the most useful case, and hence is the default
‘union’ : which has not much sense except to know which periods are definitely not covered by any data
None : no aggregation occurs, and you get a dict of the merged periods, which keys are the value of the grouping attribute
Attribute ‘period’ cannot use a * without being == * ;
Examples without grouping periods over any attribute
>>> # Let us use a kind of dataset which data files are temporally splitted, >>> # and allow for various models, and use a wildcard for period >>> so=ds(project="CMIP6", model='CNRM*', experiment="piControl", realization="r1i1p1f2", ... table="Omon", variable="so", period="*") >>> # What is the overall period covered by the union of all datafiles >>> # (but not necessarily by a single model!) >>> so.explore('choices', operation='union') { 'period': [1850-2349], 'model': ['CNRM-ESM2-1', 'CNRM-CM6-1'] .....} >>> # What is the intersection of periods covered by each datafile >>> so.explore('choices') { 'period': [None], 'model': ['CNRM-ESM2-1', 'CNRM-CM6-1'] .....} >>> # What is the list of periods covered by datafiles >>> so.explore('choices', operation=None) { 'period': {None: [1850-1899, 1900-1949, 1950-1999, 2000-2049, 2050-2099, 2100-2149, 2150-2199, 2200-2249, 2250-2299, 2300-2349]}, 'model': ['CNRM-ESM2-1', 'CNRM-CM6-1'] .....}
Examples using periods grouping over an attribute
>>> # What is the intersection of available periods after grouping them on the various values of 'model' >>> so.explore('choices',group_periods_on='model') { 'period': [1850-2349], 'model': ['CNRM-ESM2-1', 'CNRM-CM6-1'], ....} >>> # Same, but explicit the default value >>> so.explore('choices',group_periods_on='model',operation='intersection') { 'period': [1850-2349], 'model': ['CNRM-ESM2-1', 'CNRM-CM6-1'], ....} >>> # What are the aggregated periods for each value of 'model' >>> so.explore('choices',group_periods_on='model',operation=None) { 'period': {'CNRM-ESM2-1': [1850-2349], 'CNRM-CM6-1' : [1850-2349] }, 'model': ['CNRM-ESM2-1', 'CNRM-CM6-1'], ...}
cdataset.glob: explore data and/or periods, and match wildcard attributes¶
- cdataset.glob(what=None, ensure_period=True, merge_periods=True, split=None, use_frequency=False)[source]¶
Datafile exploration for a dataset which possibly has wildcards (* and ?) in attributes/facets.
Returns info regarding matching datafile or directories:
if WHAT = ‘files’ , returns a string of all data filenames
otherwise, returns a list of facet/value dictionnaries for matching data (or a pair of lists, see SPLIT below)
If ENSURE_PERIOD is True, returns only results where the requested data period is fully covered by the set of data files. Each returned period is then the same as the requested period
Otherwise, if MERGE_PERIODS is True, each returned period is actually a list of the intersections of the requested period and (merged) available data periods.
Otherwise, individual data file periods are returned.
if SPLIT is not None, a pair is returned instead of the dicts list :
first element is a dict with facets which values are the same among all cases
second element is the dicts list as above, but in which facets with common values are discarded
Example :
>>> tos_data = ds(project='CMIP6', mip='CMIP', variable='tos', period='*', table='Omon', institute='CNRM-CERFACS', model='CNRM*', realization='r1i1p1f2' )
>>> common_values, varied_values = tos_data.glob(merge_periods=True, split=True)
>>> common_values {'variable': 'tos', 'period': [1850-2014], 'root': '/bdd', 'institute': 'CNRM-CERFACS', 'mip': 'CMIP', 'table': 'Omon', 'experiment': 'historical', 'realization': 'r1i1p1f2', 'version': 'latest', 'project': 'CMIP6'}
>>> varied_values [{'model': 'CNRM-ESM2-1' , 'grid': 'gn' }, {'model': 'CNRM-ESM2-1' , 'grid': 'gr1'}, {'model': 'CNRM-CM6-1' , 'grid': 'gn' }, {'model': 'CNRM-CM6-1' , 'grid': 'gr1'}, {'model': 'CNRM-CM6-1-HR', 'grid': 'gn' } ]
cdataset.check: check time consistency of a dataset¶
- cdataset.check(frequency=False, gap=False, period=True)[source]¶
Check time consistency of first variable of a dataset or ensemble members:
if frequency is True : check if datafile frequency is consistent with facet frequency
if gap is True : check if file data have a gap
if period is True : check if period covered by data actually includes the whole of dataset period (regardless of possible gaps)
Default case is to check only period
Returns: True if every check is OK, False if one fails, None if any cannot be analyzed
For gap and period check, monthly data are processed quite empirically
cdataset.light_check: check time consistency of a dataset w.r.t to dates in data filenames¶
- cdataset.light_check()[source]¶
Check that dataset’s period is covered by the period deduced from the filenames of its datafiles. Filenames with non-date digits (e.g. initialization year) and which period has no end date may generate interpretation problems.
Return True if the period is covered
Nervertheless, data in files may show gaps; use dataset.check(gap=True) if you need a deeper check
cdataset.listfiles: returns the list of (local) files of a dataset¶
cdef : define some default values for datasets attributes¶
- climaf.classes.cdef(attribute, value=None, project=None)[source]¶
Set or get the default value for a CliMAF dataset attribute or facet (such as e.g. ‘model’, ‘simulation’ …), for use by next calls to
cdataset()or tods()Argument ‘project’ allows to restrict the use/query of the default value to the context of the given ‘project’. On can also set the (global) default value for attribute ‘project’
There is no actual check that ‘attribute’ is a valid keyword for a call to
dsorcdatasetExample:
>>> cdef('project','OCMPI5') >>> cdef('frequency','monthly',project='OCMPI5')
eds : define an ensemble of datasets¶
- climaf.classes.eds(first=None, **kwargs)[source]¶
Create a dataset ensemble using the same calling sequence as
cdataset(), except that some facets are lists, which defines the ensemble members; these facets must be among the facets authorized for ensemble in the (single) project involvedExample:
>>> cdef("frequency","monthly") ; cdef("project","CMIP5"); cdef("model","CNRM-CM5") >>> cdef("variable","tas"); cdef("period","1860") >>> ens=eds(experiment="historical", simulation=["r1i1p1","r2i1p1"])
Argument ‘first’ is used when multiple attributes are of list type, and tells which of these attributes appears first in member labels
cens : define an ensemble of objects¶
- class climaf.classes.cens(dic={}, order=None, sortfunc=None)[source]¶
Function cens creates a CliMAF object of class
cens, i.e. a dict of objects, which keys are member labels, and which members are ordered, using methodset_orderIn some cases, ensembles of datasets from the same project can also be built easily using
eds()When applying an operator to an ensemble, CliMAF will know, from operator’s declaration (see
cscript()), whether the operator ‘wishes’ to get the ensemble or, on the reverse, is not ‘ensemble-capable’ :if the operator is ensemble-capable it will deliver it :
if it is a script : with a string composed by concatenating the corresponding input files; it will also provide the labels list to the script if its declaration calls for it with keyword ${labels} (see
cscript())if it is a Python function : with the dict of corresponding objects
if the operator is ‘ensemble-dumb’, CliMAF will loop applying it on each member, and will form a new ensemble with the results.
The dict keys must be label strings, which describe what is basically different among members. They are usually used by plot scripts to provide a caption allowing to identify each dataset/object e.g using various colors.
Examples (see also
../examples/ensemble.py) :>>> cdef('project','example'); cdef('simulation',"AMIPV6ALB2G") >>> cdef('variable','tas');cdef('frequency','monthly') >>> # >>> ds1980=ds(period="1980") >>> ds1981=ds(period="1981") >>> # >>> myens=cens({'1980':ds1980 , '1981':ds1981 }) >>> ncview(myens) # will launch ncview once per member >>> >>> myens=cens({'1980':ds1980 , '1981':ds1981 }, order=['1981','1980']) >>> myens.set_order(['1981','1980']) >>> >>> # Add a member >>> myens['abcd']=ds(period="1982")
Limitations : Even if an ensemble is a dict, some dict methods are not properly implemented (popitem, fromkeys) and function iteritems does not use member order
You can write an ensemble to a file using function
efile()
fds : define a dataset from a data file¶
- climaf.classes.fds(filename, simulation=None, variable=None, period=None, model=None)[source]¶
fds stands for FileDataSet; it allows to create a dataset simply by providing a filename and optionally a simulation name , a variable name, a period and a model name.
For dataset attributes which are not provided, these defaults apply :
simulation : the filename basename (without suffix ‘.nc’)
variable : the set of variables in the data file
period : the period actually covered by the data file (if it has time_bnds)
model : the ‘model_id’ attribute if it exists, otherwise : ‘no_model’
project : ‘file’ (with separator = ‘|’)
frequency : the value of global attribute fequency in datafile, if it exists
The following restriction apply to such datasets :
Results are unforeseen if all variables do not have the same time axis
Examples : See
data_file.py
cproject : declare a new project and its non-standard attributes/facets¶
- class climaf.classes.cproject(name, *args, **kwargs)[source]¶
Declare a project and its facets/attributes in CliMAF (see below)
- Parameters:
name (string) – project name; do not use the chosen separator in it (see below)
args (strings) – attribute names; they are free; do not use the chosen separator in it (see below); CliMAF anyway will add attributes : project, simulation, variable, period, and domain
kwargs (dict) –
can only be used with keywords :
seporseparatorfor indicating the symbol separating facets in the dataset syntax. Defaults to “.”.ensemblefor declaring a list of attribute names which are allowed for defining an ensemble in this project (‘simulation’ is automatically allowed)use_frequencyto declare that the frequency can not be derived from time bounds of the file. In this case the facetfrequencyis mandatory for the project and a default value must be defined.
Returns : a cproject object, which string representation is the pattern later used in CliMAF Refreence Syntax for representing datasets in this project
A ‘cproject’ is the definition of a set of attributes, or facets, which values will completely define a ‘dataset’ as managed by CliMAF. Its name is one of the possible keys for describing data locations (see
dataloc)For instance, cproject CMIP5, after its Data Reference Syntax, has attributes : model, simulation (used for rip), experiment, variable, frequency, realm, table, version
A number of projects are built-in. See
projectsA dataset in a cproject declared as
>>> cproject('MINE','myfreq','myfacet',sep='_')
will return
${project}_${simulation}_${variable}_${period}_${domain}_${myfreq}_${myfacet}and will have datasets represented as e.g.:
'MINE_hist_tas_[1980-1999]_global_decadal_gabu'while an example for built-in cproject CMIP5 will be:
'CMIP5.historical.pr.[1980].global.monthly.CNRM-CM5.r1i1p1.mon.Amon.atmos.last'The attributes list should include all facets which are useful for distinguishing datasets from each other, and for computing datafile pathnames in the ‘generic’ organization (see
dataloc)A default value for a given facet can be specified, by providing a tuple (facet_name,default_value) instead of the facet name. This default value is however of lower priority than the value set using
cdef()A project can be declared as having non-standard variable names in datafiles, or variables that should undergo re-scaling; see
calias()A project can be declared as having non-standard frequency names (this is used when accessing datafiles); see
cfreqs())
derive_cproject : create a new project from an existing one by changing its name and possibly its facets¶
- climaf.classes.derive_cproject(name, parent_name, new_project_facets=[])[source]¶
Create a new project named ‘name’ from the project ‘parent_name’ adding the facets listed in ‘new_project_facets’ if specified. Also derive the location list from the parent project.
- Parameters:
name – name of the new project
parent_name – name of the source project
new_project_facets – the list of the facets to add to the new project (could be already present in parent).
- Returns:
the new project
cprojects : dictionary of known projects¶
- env.environment.cprojects = {None: ${project}.${simulation}.${variable}.${period}.${domain}}¶
Dictionary of declared projects (type is cproject)
- env.environment.data_check = False¶
Should ds() calls be checked w.r.t. datafiles. “if_found” means yes if some relevant datafiles exists. Other allowed values are True and False. See that section of class cdataset’s documentation
- env.environment.period_check_type = 'light'¶
On ds() calls, which level of check of the requested period w.r.t datafiles. See that section of class cdataset’s documentation
dataloc : describe data locations for a series of simulations¶
- class climaf.dataloc.dataloc(project='*', organization='generic', url=None, model='*', simulation='*', realm='*', table='*', frequency='*')[source]¶
Create an entry in the data locations dictionary for an ensemble of datasets.
- Parameters:
project (str,optional) – project name
model (str,optional) – model name
simulation (str,optional) – simulation name
frequency (str,optional) – frequency
organization (str) – name of the organization type, among those handled by
selectFiles()url (list of strings) – list of URLS for the data root directories, local or remote
Each entry in the dictionary allows to store :
a list of path or URLS (local or remote), which are root paths for finding some sets of datafiles which share a file organization scheme.
For remote data:
url is supposed to be in the format ‘protocol:user@host:path’, but ‘protocol’ and ‘user’ are optional. So, url can also be ‘user@host:path’ or ‘protocol:host:path’ or ‘host:path’. ftp is default protocol (and the only one which is yet managed, AMOF).
If ‘user’ is given:
if ‘host’ is in $HOME/.netrc file, CliMAF check if corresponding ‘login == ‘user’. If it is, CliMAF get associated password; otherwise it will prompt the user for entering password;
if ‘host’ is not present in $HOME/.netrc file, CliMAF will prompt the user for entering password.
If ‘user’ is not given:
if ‘host’ is in $HOME/.netrc file, CliMAF get corresponding ‘login’ as ‘user’ and also get associated password;
if ‘host’ is not present in $HOME/.netrc file, CliMAF prompt the user for entering ‘user’ and ‘password’.
Remark: The .netrc file contains login and password used by the auto-login process. It generally resides in the user’s home directory ($HOME/.netrc). So, it is highly recommended to supply this information in .netrc file not to have to enter password in every request.
Warning: python netrc module does not handle multiple entries for a single host. So, if netrc file has two entries for the same host, the netrc module only returns the last entry.
We define two kinds of host: hosts with evolving files, e.g. ‘beaufix’; and the others.
For any file returned by function
listfiles()which is found in cache:in case of hosts with dynamic files, the file is transferred only if its date on server is more recent than that found in cache;
for other hosts, the file found in cache is used
the name for the corresponding data files organization scheme. The current set of known schemes is :
- CMIP5_DRSany datafile organized after the CMIP5 data reference syntax, such as on IPSL’s Ciclad and
CNRM’s Lustre
EM : CNRM-CM post-processed outputs as organized using EM (please use a list of anyone string for arg urls)
generic : a data organization described by the user, using patterns such as described for
selectGenericFiles(). This is the default
Please ask the CliMAF dev team for implementing further organizations. It is quite quick for data which are on the filesystem. Organizations considered for future implementations are :
NetCDF model outputs as available during an ECLIS or ligIGCM simulation
ESGF
the set of attribute values which simulation’s data are stored at that URLS and with that organization
For remote files, filename pattern must include ${varname}, which is instanciated by variable name or filenameVar (given via
calias()), for the sake of efficiency. Please complain if this is inadequate
For the sake of brievity, each attribute can have the ‘*’ wildcard value; when using the dictionary, the most specific entries will be used (which means : the entry (or entries) with the lowest number of wildcards)
Example :
Declaring that all IPSLCM-Z-HR data for project PRE_CMIP6 are stored under a single root path and folllows organization named CMIP6_DRS:
>>> dataloc(project='PRE_CMIP6', model='IPSLCM-Z-HR', organization='CMIP6_DRS', url=['/prodigfs/esg/'])
and declaring an exception for one simulation (here, both location and organization are supposed to be different):
>>> dataloc(project='PRE_CMIP6', model='IPSLCM-Z-HR', simulation='my_exp', organization='EM', ... url=['~/tmp/my_exp_data'])
and declaring a project to access remote data (on multiple servers):
>>> cproject('MY_REMOTE_DATA', ('frequency', 'monthly'), separator='|') >>> dataloc(project='MY_REMOTE_DATA', organization='generic', ... url=['beaufix:/home/gmgec/mrgu/vignonl/*/${simulation}SFX${PERIOD}.nc', ... 'ftp:vignonl@hendrix:/home/vignonl/${model}/${variable}_1m_${PERIOD}_${model}.nc']), >>> calias('MY_REMOTE_DATA','tas','tas',filenameVar='2T') >>> tas = ds(project='MY_REMOTE_DATA', simulation='AMIPV6ALB2G', variable='tas', frequency='monthly', ... period='198101')
Please refer to the example section of the documentation for an example with each organization scheme
cdefault: set or get a default value for some data attribute/facet¶
- climaf.classes.cdef(attribute, value=None, project=None)[source]¶
Set or get the default value for a CliMAF dataset attribute or facet (such as e.g. ‘model’, ‘simulation’ …), for use by next calls to
cdataset()or tods()Argument ‘project’ allows to restrict the use/query of the default value to the context of the given ‘project’. On can also set the (global) default value for attribute ‘project’
There is no actual check that ‘attribute’ is a valid keyword for a call to
dsorcdatasetExample:
>>> cdef('project','OCMPI5') >>> cdef('frequency','monthly',project='OCMPI5')
derive : define a variable as computed from other variables¶
- climaf.operators_derive.derive(project, derivedVar, Operator, *invars, **params)[source]¶
Define that ‘derivedVar’ is a derived variable in ‘project’, computed by applying ‘Operator’ to input streams which are datasets whose variable names take the values in
*invarsand the parameter/arguments of Operator take the values in**params‘project’ may be the wildcard : ‘*’
Example, assuming that operator ‘minus’ has been defined as
>>> cscript('minus','cdo sub ${in_1} ${in_2} ${out}')
which means that
minususes CDO for substracting the two datasets; you may define, for a given project ‘CMIP5’, a new variable e.g. for cloud radiative effect at the surface, named ‘rscre’, using the difference of values of all-sky and clear-sky net radiation at the surface by:>>> derive('CMIP5', 'rscre','minus','rs','rscs')
You may then use this variable name at any location you would use any other variable name
Note : you may use wildcard ‘*’ for the project
Another example is rescaling or renaming some variable; here, let us define how variable ‘ta’ can be derived from ERAI variable ‘t’ :
>>> derive('erai', 'ta','rescale', 't', scale=1., offset=0.)
However, this is not the most efficient way to do that. See
calias()Expert use : argument ‘derivedVar’ may be a dictionary, which keys are derived variable names and values are scripts outputs names; example
>>> cscript('vertical_interp', 'vinterp.sh ${in} surface_pressure=${in_2} ${out_l500} ${out_l850} method=${opt}') >>> derive('*', {'z500' : 'l500' , 'z850' : 'l850'},'vertical_interp', 'zg', 'ps', opt='log')
calias : define a variable as computed, in a project, from another, single, variable¶
- climaf.classes.calias(project, variable, fileVariable=None, scale=1.0, offset=0.0, units=None, missing=None, filenameVar=None, conditions=None)[source]¶
Declare that in
project,variableis to be computed by readingfilevariable, and applyingscaleandoffset; (see first example erai below)Arg
conditionsallows to restrict the effect, based on the value of some facets. It is a dictionary of applicable values or values’list, which keys are the facets (see example CMIP6 below)Arg
filenameVarallows to tell which fake variable name should be used when computing the filename for this variable in this project (for optimisation purpose); (see seconf example erai below)Can tell that a given constant must be interpreted as a missing value (see 4th example, EM, below)
variablemay be a list. In that case,fileVariableandfilenameVar, if provided, should be parallel lists`` variable`` can be a comma separated list of variables, in which case this tells how variables are grouped in files (it make sense to use filenameVar in that case, as this is a way to provide the label which is unique to this grouping of variable; scale, offset and missing args must be the same for all variables in that case
Example
>>> # scale and offset may be provided >>> calias('erai','tas_degC','t2m',scale=1., offset=-273.15) >>> calias('CMIP6','evspsbl',scale=-1., conditions={ 'model':'CanESM5' , 'version': ['20180103', '20190112'] }) >>> calias('erai','tas','t2m',filenameVar='2T') >>> calias('EM',[ 'sic', 'sit', 'sim', 'snd', 'ialb', 'tsice'], missing=1.e+20) >>> calias('data_CNRM','so,thetao',filenameVar='grid_T_table2.2')
NB: A wrapper with same name of this function is defined in
climaf.driver.calias()and it is the one which is exported by module climaf.api. It allows to use a list of variable.
cfreqs : declare non-standard frequency names, for a project¶
- climaf.classes.cfreqs(project, dic)[source]¶
Allow to declare a dictionary specific to
projectfor matchingnormalizedfrequency values to project-specific frequency values- Normalized frequency values are :
decadal, yearly, monthly, daily, 6h, 3h, fx and annual_cycle
When defining a dataset, any reference to a non-standard frequency will be left unchanged both in the datset’s CRS and when trying to access corresponding datafiles
Examples:
>>> cfreqs('CMIP5',{'monthly':'mon' , 'daily':'day' })
crealms : declare non-standard realm names, for a project¶
- climaf.classes.crealms(project, dic)[source]¶
Allow to declare a dictionary specific to
projectfor matchingnormalizedrealm names to project-specific realm names- Normalized realm names are :
atmos, ocean, land, seaice
When defining a dataset, any reference to a non-standard realm will be left unchanged both in the datset’s CRS and when trying to access corresponding datafiles
Examples:
>>> crealms('CMIP5',{'atmos':'ATM' , 'ocean':'OCE' })