Internal functions - presented here for their doc¶
init_period¶
This function should not be called directly ; it is presented here mainly for documenting the syntax of the strings describing a period of time
-
climaf.period.
init_period
(dates)[source]¶ Init a CliMAF ‘period’ object
Parameters: dates (str) – must match r’YYYY[MM[DD[HH[MM]]]][(-|_)YYYY[MM[DD[HH[MM]]]]]’ , or be ‘fx’ for fixed fields Returns: the corresponding CliMAF ‘period’ object When using only YYYY, can omit some Ys (for zeros). Cannot handle year 0000
Examples :
- a one-year long period : ‘1980’, or ‘1980-1980’
- a decade : ‘1980-1989’
- first millenium : 1-1000 # Must have leading zeroes if you want to quote a month
- first century : 1-100
- one month : ‘198005’
- two months : ‘198003-198004’
- one day : ‘17890714’
- the same single day, in a more complicated way : ‘17890714-17890714’
CliMAF internally handles date-time values with a 1 minute accurracy; it can provide date information to external scripts in two forms; see keywords ‘period’ and ‘period_iso’ in
cscript()
selectFiles¶
This function should not be called directly ; it is presented here mainly for documenting the list of
organizations it can handle for function dataloc
-
climaf.dataloc.
selectFiles
(return_wildcards=None, merge_periods_on=None, return_combinations=None, with_periods=None, use_frequency=False, **kwargs)[source]¶ Returns the shortest list of (local or remote) files which include the data for the list of (facet,value) pairs provided
Method :
- use datalocations indexed by
dataloc()
to identify data organization and data store urls for these (facet,value) pairs - check that data organization is as known one, i.e. is one of ‘generic’, CMIP5_DRS’ or ‘EM’
- derive relevant filenames search function such as as : py:func:~climaf.dataloc.selectCmip5DrsFiles from data organization scheme
- pass urls and relevant facet values to this filenames search function
- use datalocations indexed by
selectGenericFiles¶
This function should not be called directly ; it is presented here
mainly for documenting the syntax of
argument url
of function dataloc
when
organization
is set to generic
-
climaf.dataloc.
selectGenericFiles
(urls, return_wildcards=None, merge_periods_on=None, return_combinations=None, use_frequency=False, **kwargs)[source]¶ Allow to describe a
generic
file organization : the list of files returned by this function is composed of files which :- match the patterns in
url
once these patterns are instantiated by - the values in kwargs, and
- match the patterns in
- contain the
variable
provided in kwargs - match the period` provided in kwargs
kwargs can have entries which are list, and are then interpreted as :
- a first element which is a pattern (i.e. which include * or ?)
- more elements which are the possible values, as diagnosed by some logic upstream
In the pattern strings, no keyword is mandatory. However, for remote files, filename pattern must include ${varname}, which is instanciated by variable name or
filenameVar
(given viacalias()
); this is for the sake of efficiency (please complain if inadequate)Example :
>>> selectGenericFiles(project ='my_projet',model ='my_model', simulation ='lastexp', variable ='tas', ... period ='1980', urls =['~/DATA/${project}/${model}/*${variable}*${PERIOD}*.nc)']
/home/stephane/DATA/my_project/my_model/somefilewith_tas_Y1980.nc
In the pattern strings, the keywords that can be used in addition to the argument names (e.g. ${model}) are:
- ${variable} : use it if the files are split by variable and
- filenames do include the variable name, as this speed up the search
- ${PERIOD} : use it for indicating the period covered by each file, if this
- is applicable in the file naming; this period can appear in filenames as YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHHMM, either once only, or twice with separator =’-’ or ‘_’
- wildcards ‘?’ and ‘*’ for matching respectively one and any number of characters
Résumé en francais :
On construit une expression régulière pour matcher les périodes
On boucle sur les patterns de la liste url :
Instancier le pattern par les valeurs des facettes fournies, et par “.*” pour $PERIOD
on fait glob.glob
on affine : on ne retient que les valeurs qui matchent avec la regexp de périodes (sous réserve que le pattern contienne $PERIOD) si on n’a rien, on essaie aussi avec filenameVar; d’où une liste de fichiers lfiles
on cherche a connaitre les valeurs rencontrées pour chaque facette : on construit une expression régulière (avec groupes) qui capture les valeurs de facettes (y/c PERIOD) et une autre pour capturer la date seulement (est-ce bien encore nécessaire ???)
Boucle sur les fichiers de lfiles:
si le pattern n’indique pas qu’on peut extraire la date,
- si la frequence indique un champ fixe, on retient le fichier;
- sinon , on le retient aussi sans filtrer sur la période
si oui,
- on extrait la periode
- si elle convient (divers cas …)
- si on a pu filtrer sur la variable,
- ou que variable =”*” ou variable multiple, ou que le fichier contient la bonne variable, eventuellement après renommage on retient le fichier
A chaque fois qu’on retient un fichier , on ajoute au dict wildcard_facets les valeurs recontrées pour les attributs
Dès qu’un pattern de la liste url a eu des fichiers qui collent, on abandonne l’examen des patterns suivants
A la fin , on formatte le dictionnaire de valeurs de facettes qui est rendu