Catalog and dataset set up, NCEI feature type explainer#

ocean-model-skill-assessor (OMSA) reads datasets from input Intake catalogs in order to abstract away the read in process. However, there are a few requirements of and suggestions for these catalogs, which are presented here.

NCEI feature types#

The NCEI netCDF feature types are useful because they describe what does and does not fit various definitions of oceanography data types. This defines types of dataset. More information is available in general and for the current NCEI NetCDF Templates 2.0. The following information may be useful for thinking about this and the necessary information below:

timeSeries profile timeSeriesProfile trajectory (TODO) trajectoryProfile grid (TODO)
Definition only t changes only z changes t and z change t, y, and x change t, z, y, and x change t changes, y/x grid
Data types mooring, buoy CTD profile moored ADCP flow through, 2D drifter glider, transect of CTD profiles, towed ADCP, 3D drifter satellite, HF Radar
maptypes point point point point(s), line, box point(s), line, box box
X/Y are pairs (locstream) or grid either locstream or grid either locstream or grid either locstream or grid locstream locstream grid
Which dimensions are independent from X/Y choice?
T Independent Independent Independent match X/Y match X/Y Independent
Z Independent Independent Independent Independent match X/Y Independent

Requirements for datasets#

Requirements: pandas DataFrames#

  • cf-pandas must be able to identify a single column for each of the following keys:

    • T

    • Z

    • latitude

    • longitude

You can check a Catalog object with omsa.utils.check_dataframe(df, no_Z).

Additionally, the variable you want to compare between model and data must be identifiable in both the dataset and model output using the custom vocabulary and a key in the vocabulary.

Requirements and suggestions for Intake catalogs#

Requirements#

  • Metadata for a dataset must include:

    • an entry for “featuretype” that is a string of the NCEI-defined feature type that describes the dataset. Currently supported are timeSeries, profile, trajectoryProfile, timeSeriesProfile (trajectory and grid still to come).

    • an entry for “maptype” that is how to plot the dataset on a map. Currently supported are “point”, “line”, and “box”.

    • “minLongitude”, “maxLongitude”, “minLatitude”, “maxLatitude”

    • “minTime”, “maxTime”

You can check a Catalog object with omsa.utils.check_catalog(cat).

Suggestions#

  • Do not encode indices for pandas DataFrames. If you do, though, they will be reset in OMSA.

  • Note that DataFrames with a column that can be identified by cf-pandas as “T” will be parsed as datetimes.

How to make an Intake catalog#

  • Use an Intake driver that supports direct catalog creation such as intake-erddap.

  • Use omsa.main.make_catalog() or omsa.main.make_local_catalog()

How to modify an Intake catalog#

  • coming soon, to add metadata to existing catalog