Internal functions - presented here for their doc

init_period

This function should not be called directly ; it is presented here mainly for documenting the syntax of the strings describing a period of time

climaf.period.init_period(dates)[source]

Init a CliMAF ‘period’ object

Parameters:dates (str) – must match r’YYYY[MM[DD[HH[MM]]]][(-|_)YYYY[MM[DD[HH[MM]]]]]’ , or be ‘fx’ for fixed fields
Returns:the corresponding CliMAF ‘period’ object

When using only YYYY, can omit some Ys (for zeros). Cannot handle year 0000

Examples :

  • a one-year long period : ‘1980’, or ‘1980-1980’
  • a decade : ‘1980-1989’
  • first millenium : 1-1000 # Must have leading zeroes if you want to quote a month
  • first century : 1-100
  • one month : ‘198005’
  • two months : ‘198003-198004’
  • one day : ‘17890714’
  • the same single day, in a more complicated way : ‘17890714-17890714’

CliMAF internally handles date-time values with a 1 minute accurracy; it can provide date information to external scripts in two forms; see keywords ‘period’ and ‘period_iso’ in cscript()

selectFiles

This function should not be called directly ; it is presented here mainly for documenting the list of organizations it can handle for function dataloc

climaf.dataloc.selectFiles(return_wildcards=None, merge_periods_on=None, return_combinations=None, with_periods=None, use_frequency=False, **kwargs)[source]

Returns the shortest list of (local or remote) files which include the data for the list of (facet,value) pairs provided

Method :

  • use datalocations indexed by dataloc() to identify data organization and data store urls for these (facet,value) pairs
  • check that data organization is as known one, i.e. is one of ‘generic’, CMIP5_DRS’ or ‘EM’
  • derive relevant filenames search function such as as : py:func:~climaf.dataloc.selectCmip5DrsFiles from data organization scheme
  • pass urls and relevant facet values to this filenames search function

selectGenericFiles

This function should not be called directly ; it is presented here mainly for documenting the syntax of argument url of function dataloc when organization is set to generic

climaf.dataloc.selectGenericFiles(urls, return_wildcards=None, merge_periods_on=None, return_combinations=None, use_frequency=False, **kwargs)[source]

Allow to describe a generic file organization : the list of files returned by this function is composed of files which :

  • match the patterns in url once these patterns are instantiated by
    the values in kwargs, and
  • contain the variable provided in kwargs
  • match the period` provided in kwargs

kwargs can have entries which are list, and are then interpreted as :

  • a first element which is a pattern (i.e. which include * or ?)
  • more elements which are the possible values, as diagnosed by some logic upstream

In the pattern strings, no keyword is mandatory. However, for remote files, filename pattern must include ${varname}, which is instanciated by variable name or filenameVar (given via calias()); this is for the sake of efficiency (please complain if inadequate)

Example :

>>> selectGenericFiles(project ='my_projet',model ='my_model', simulation ='lastexp', variable ='tas',
...                    period ='1980', urls =['~/DATA/${project}/${model}/*${variable}*${PERIOD}*.nc)']

/home/stephane/DATA/my_project/my_model/somefilewith_tas_Y1980.nc

In the pattern strings, the keywords that can be used in addition to the argument names (e.g. ${model}) are:

  • ${variable} : use it if the files are split by variable and
    filenames do include the variable name, as this speed up the search
  • ${PERIOD} : use it for indicating the period covered by each file, if this
    is applicable in the file naming; this period can appear in filenames as YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHHMM, either once only, or twice with separator =’-’ or ‘_’
  • wildcards ‘?’ and ‘*’ for matching respectively one and any number of characters

Résumé en francais :

  • On construit une expression régulière pour matcher les périodes

  • On boucle sur les patterns de la liste url :

    • Instancier le pattern par les valeurs des facettes fournies, et par “.*” pour $PERIOD

    • on fait glob.glob

    • on affine : on ne retient que les valeurs qui matchent avec la regexp de périodes (sous réserve que le pattern contienne $PERIOD) si on n’a rien, on essaie aussi avec filenameVar; d’où une liste de fichiers lfiles

    • on cherche a connaitre les valeurs rencontrées pour chaque facette : on construit une expression régulière (avec groupes) qui capture les valeurs de facettes (y/c PERIOD) et une autre pour capturer la date seulement (est-ce bien encore nécessaire ???)

    • Boucle sur les fichiers de lfiles:

      • si le pattern n’indique pas qu’on peut extraire la date,

        • si la frequence indique un champ fixe, on retient le fichier;
        • sinon , on le retient aussi sans filtrer sur la période
      • si oui,

        • on extrait la periode
        • si elle convient (divers cas …)
        • si on a pu filtrer sur la variable,
          ou que variable =”*” ou variable multiple, ou que le fichier contient la bonne variable, eventuellement après renommage on retient le fichier
      • A chaque fois qu’on retient un fichier , on ajoute au dict wildcard_facets les valeurs recontrées pour les attributs

    • Dès qu’un pattern de la liste url a eu des fichiers qui collent, on abandonne l’examen des patterns suivants

  • A la fin , on formatte le dictionnaire de valeurs de facettes qui est rendu