Catalog and dataset set up, NCEI feature type explainer#
ocean-model-skill-assessor
(OMSA) reads datasets from input Intake
catalogs in order to abstract away the read in process. However, there are a few requirements of and suggestions for these catalogs, which are presented here.
NCEI feature types#
The NCEI netCDF feature types are useful because they describe what does and does not fit various definitions of oceanography data types. This defines types of dataset. More information is available in general and for the current NCEI NetCDF Templates 2.0. The following information may be useful for thinking about this and the necessary information below:
timeSeries | profile | timeSeriesProfile | trajectory (TODO) | trajectoryProfile | grid (TODO) | |
---|---|---|---|---|---|---|
Definition | only t changes | only z changes | t and z change | t, y, and x change | t, z, y, and x change | t changes, y/x grid |
Data types | mooring, buoy | CTD profile | moored ADCP | flow through, 2D drifter | glider, transect of CTD profiles, towed ADCP, 3D drifter | satellite, HF Radar |
maptypes | point | point | point | point(s), line, box | point(s), line, box | box |
X/Y are pairs (locstream) or grid | either locstream or grid | either locstream or grid | either locstream or grid | locstream | locstream | grid |
Which dimensions are independent from X/Y choice? | ||||||
T | Independent | Independent | Independent | match X/Y | match X/Y | Independent |
Z | Independent | Independent | Independent | Independent | match X/Y | Independent |
Requirements for datasets#
Requirements: pandas DataFrames#
cf-pandas
must be able to identify a single column for each of the following keys:T
Z
latitude
longitude
You can check a Catalog object with omsa.utils.check_dataframe(df, no_Z)
.
Additionally, the variable you want to compare between model and data must be identifiable in both the dataset and model output using the custom vocabulary and a key in the vocabulary.
Requirements and suggestions for Intake catalogs#
Requirements#
Metadata for a dataset must include:
an entry for “featuretype” that is a string of the NCEI-defined feature type that describes the dataset. Currently supported are
timeSeries
,profile
,trajectoryProfile
,timeSeriesProfile
(trajectory
andgrid
still to come).an entry for “maptype” that is how to plot the dataset on a map. Currently supported are “point”, “line”, and “box”.
“minLongitude”, “maxLongitude”, “minLatitude”, “maxLatitude”
“minTime”, “maxTime”
You can check a Catalog object with omsa.utils.check_catalog(cat)
.
Suggestions#
Do not encode indices for pandas DataFrames. If you do, though, they will be reset in OMSA.
Note that DataFrames with a column that can be identified by
cf-pandas
as “T” will be parsed as datetimes.
How to make an Intake catalog#
Use an Intake driver that supports direct catalog creation such as
intake-erddap
.Use
omsa.main.make_catalog()
oromsa.main.make_local_catalog()
How to modify an Intake catalog#
coming soon, to add metadata to existing catalog