How to make and work with vocabularies and vocab labels#

This page demonstrates the workflow of making a new vocabulary, saving it to the user application cache, and reading it back in to use it. The vocabulary created is the exact same as the “general” vocabulary that is saved with the OMSA package, though here it is given another name to demonstrate that you could be making any new vocabulary you want.

Here is the list of variables of interest (with “nickname”), aimed at a physical oceanographer, which are built into the vocabulary:

  • water temperature “temp”

  • salinity “salt”

  • sea surface height “ssh”

  • u velocity “u”

  • v velocity “v”

  • w upward velocity “w”

  • direction of water velocity “water_dir”

  • magnitude of water velocity “water_speed”

  • wind direction “wind_dir”

  • wind speed “wind_speed”

  • sea ice velocity u “sea_ice_u”

  • sea ice velocity v “sea_ice_v”

  • sea ice area fraction “sea_ice_area_fraction”

Vocab labels are used in model-data comparison plots to support nice labeling. They are a dictionary with the same keys as the vocabularies being used and the value is the string you want to use for that variable key’s label in a plot.

import cf_pandas as cfp
import ocean_model_skill_assessor as omsa
import pandas as pd

Vocabulary workflow#

Make vocabulary#

Here we show making the “general” vocabulary that is saved into the repository. This is a more general vocabulary to identify variables from sources that don’t use exact CF standard_names.

nickname = "temp"
vocab = cfp.Vocab()

# define a regular expression to represent your variable
reg = cfp.Reg(include_or=["temp","sst"], exclude=["air","qc","status","atmospheric","bottom"])

# Make an entry to add to your vocabulary
vocab.make_entry(nickname, reg.pattern(), attr="name")

vocab.make_entry("salt", cfp.Reg(include_or=["sal","sss"], exclude=["soil","qc","status","bottom"]).pattern(), attr="name")
vocab.make_entry("ssh", cfp.Reg(include_or=["sea_surface_height","surface_elevation"], exclude=["qc","status"]).pattern(), attr="name")

reg = cfp.Reg(include=["east", "vel"])
vocab.make_entry("u", "u$", attr="name")
vocab.make_entry("u", reg.pattern(), attr="name")

reg = cfp.Reg(include=["north", "vel"])
vocab.make_entry("v", "v$", attr="name")
vocab.make_entry("v", reg.pattern(), attr="name")

reg = cfp.Reg(include=["up", "vel"])
vocab.make_entry("w", "w$", attr="name")
vocab.make_entry("w", reg.pattern(), attr="name")

vocab.make_entry("water_dir", cfp.Reg(include=["dir","water"], exclude=["qc","status","air","wind"]).pattern(), attr="name")

vocab.make_entry("water_speed", cfp.Reg(include=["speed","water"], exclude=["qc","status","air","wind"]).pattern(), attr="name")

vocab.make_entry("wind_dir", cfp.Reg(include=["dir","wind"], exclude=["qc","status","water"]).pattern(), attr="name")

vocab.make_entry("wind_speed", cfp.Reg(include=["speed","wind"], exclude=["qc","status","water"]).pattern(), attr="name")

reg1 = cfp.Reg(include=["sea","ice","u"], exclude=["qc","status"])
reg2 = cfp.Reg(include=["sea","ice","x","vel"], exclude=["qc","status"])
reg3 = cfp.Reg(include=["sea","ice","east","vel"], exclude=["qc","status"])
vocab.make_entry("sea_ice_u", reg1.pattern(), attr="name")
vocab.make_entry("sea_ice_u", reg2.pattern(), attr="name")
vocab.make_entry("sea_ice_u", reg3.pattern(), attr="name")

reg1 = cfp.Reg(include=["sea","ice","v"], exclude=["qc","status"])
reg2 = cfp.Reg(include=["sea","ice","y","vel"], exclude=["qc","status"])
reg3 = cfp.Reg(include=["sea","ice","north","vel"], exclude=["qc","status"])
vocab.make_entry("sea_ice_v", reg1.pattern(), attr="name")
vocab.make_entry("sea_ice_v", reg2.pattern(), attr="name")
vocab.make_entry("sea_ice_v", reg3.pattern(), attr="name")

vocab.make_entry("sea_ice_area_fraction", cfp.Reg(include=["sea","ice","area","fraction"], exclude=["qc","status"]).pattern(), attr="name")

vocab
{'temp': {'name': '(?i)^(?!.*(air|qc|status|atmospheric|bottom)).*(temp|sst).*'}, 'salt': {'name': '(?i)^(?!.*(soil|qc|status|bottom)).*(sal|sss).*'}, 'ssh': {'name': '(?i)^(?!.*(qc|status)).*(sea_surface_height|surface_elevation).*'}, 'u': {'name': 'u$|(?i)(?=.*east)(?=.*vel)'}, 'v': {'name': 'v$|(?i)(?=.*north)(?=.*vel)'}, 'w': {'name': 'w$|(?i)(?=.*up)(?=.*vel)'}, 'water_dir': {'name': '(?i)^(?!.*(qc|status|air|wind))(?=.*dir)(?=.*water)'}, 'water_speed': {'name': '(?i)^(?!.*(qc|status|air|wind))(?=.*speed)(?=.*water)'}, 'wind_dir': {'name': '(?i)^(?!.*(qc|status|water))(?=.*dir)(?=.*wind)'}, 'wind_speed': {'name': '(?i)^(?!.*(qc|status|water))(?=.*speed)(?=.*wind)'}, 'sea_ice_u': {'name': '(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*u)|(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*x)(?=.*vel)|(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*east)(?=.*vel)'}, 'sea_ice_v': {'name': '(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*v)|(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*y)(?=.*vel)|(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*north)(?=.*vel)'}, 'sea_ice_area_fraction': {'name': '(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*area)(?=.*fraction)'}}

Save it#

This exact vocabulary was previously saved as “general” and is available under that name, but this page demonstrates saving a new vocabulary and so we use the name “general2” to differentiate.

paths = omsa.paths.Paths()
vocab.save(paths.VOCAB_PATH("general2"))
paths.VOCAB_PATH("general2")
PosixPath('/home/docs/.cache/ocean-model-skill-assessor/vocab/general2.json')

Use it later#

Read the saved vocabulary back in to use it:

vocab = cfp.Vocab(paths.VOCAB_PATH("general2"))

df = pd.DataFrame(columns=["sst", "time", "lon", "lat"], data={"sst": [1,2,3]})
with cfp.set_options(custom_criteria=vocab.vocab):
    print(df.cf["temp"])
0    1
1    2
2    3
Name: sst, dtype: int64

Combine vocabularies#

A user can add together vocabularies. For example, here we combine the built-in “standard_names” and “general” vocabularies.

v1 = cfp.Vocab(paths.VOCAB_PATH("standard_names"))
v2 = cfp.Vocab(paths.VOCAB_PATH("general"))

v = v1 + v2
v
{'sea_ice_area_fraction': {'name': '(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*area)(?=.*fraction)', 'standard_name': 'sea_ice_area_fraction$'}, 'water_speed': {'name': '(?i)^(?!.*(qc|status|air|wind))(?=.*speed)(?=.*water)', 'standard_name': 'sea_water_speed$'}, 'salt': {'name': '(?i)^(?!.*(soil|qc|status|bottom)).*(sal|sss).*', 'standard_name': 'sea_surface_salinity$|sea_water_absolute_salinity$|sea_water_practical_salinity$|sea_water_salinity$'}, 'sea_ice_u': {'name': '(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*u)|(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*x)(?=.*vel)|(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*east)(?=.*vel)', 'standard_name': 'eastward_sea_ice_velocity$|sea_ice_x_velocity$'}, 'sea_ice_v': {'name': '(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*v)|(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*y)(?=.*vel)|(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*north)(?=.*vel)', 'standard_name': 'northward_sea_ice_velocity$|sea_ice_y_velocity$'}, 'ssh': {'name': '(?i)^(?!.*(qc|status)).*(sea_surface_height|surface_elevation|zeta).*', 'standard_name': 'sea_surface_elevation|sea_surface_height_above_geoid$|sea_surface_height_above_geopotential_datum$|sea_surface_height_above_mean_sea_level$|sea_surface_height_above_reference_ellipsoid$|surface_height_above_geopotential_datum$|tidal_sea_surface_height_above_lowest_astronomical_tide$|tidal_sea_surface_height_above_mean_higher_high_water$|tidal_sea_surface_height_above_mean_lower_low_water$|tidal_sea_surface_height_above_mean_low_water_springs$|tidal_sea_surface_height_above_mean_sea_level$|water_surface_height_above_reference_datum$|water_surface_reference_datum_altitude$'}, 'wind_speed': {'name': '(?i)^(?!.*(qc|status|water))(?=.*speed)(?=.*wind)', 'standard_name': 'wind_speed$'}, 'water_dir': {'name': '(?i)^(?!.*(qc|status|air|wind))(?=.*dir)(?=.*water)', 'standard_name': 'sea_water_velocity_from_direction$|sea_water_velocity_to_direction$'}, 'u': {'name': 'u$|(?i)(?=.*east)(?=.*vel)', 'standard_name': 'baroclinic_eastward_sea_water_velocity$|barotropic_eastward_sea_water_velocity$|barotropic_sea_water_x_velocity$|eastward_sea_water_velocity$|eastward_sea_water_velocity_assuming_no_tide$|geostrophic_eastward_sea_water_velocity$|sea_water_x_velocity$|surface_eastward_sea_water_velocity$|surface_geostrophic_eastward_sea_water_velocity$|surface_geostrophic_sea_water_x_velocity$'}, 'temp': {'name': '(?i)^(?!.*(air|qc|status|atmospheric|bottom|dew)).*(temp|sst).*', 'standard_name': 'sea_surface_temperature$|sea_water_potential_temperature$|sea_water_temperature$'}, 'w': {'name': 'w$|(?i)(?=.*up)(?=.*vel)', 'standard_name': 'upward_sea_water_velocity$'}, 'v': {'name': 'v$|(?i)(?=.*north)(?=.*vel)', 'standard_name': 'baroclinic_northward_sea_water_velocity$|barotropic_northward_sea_water_velocity$|barotropic_sea_water_y_velocity$|northward_sea_water_velocity$|northward_sea_water_velocity_assuming_no_tide$|northward_sea_water_velocity_due_to_tides$|sea_water_y_velocity$|surface_northward_sea_water_velocity$'}, 'wind_dir': {'name': '(?i)^(?!.*(qc|status|water))(?=.*dir)(?=.*wind)', 'standard_name': 'wind_from_direction$|wind_to_direction$'}}

Using the cf-pandas widget#

.. raw:: html

Vocab labels#

There is a default set of labels in the repository available alongside the default vocabs, called “vocab_labels.json”.

You can use cf-pandas to open up and look at vocal_labels like a vocabulary since they are both just dictionaries stored as json.

vocab_labels = cfp.Vocab(paths.VOCAB_PATH("vocab_labels"))
vocab_labels
{'temp': 'Sea water temperature [C]', 'salt': 'Sea water salinity [psu]', 'ssh': 'Sea surface height [m]', 'u': 'x-axis velocity [m/s]', 'v': 'y-axis velocity [m/s]', 'w': 'z velocity [m/s]', 'along': 'Along-channel velocity [m/s]', 'across': 'Across-channel velocity [m/s]', 'speed': 'Horizontal speed [m/s]', 'east': 'Eastward velocity [m/s]', 'north': 'Northward velicity [m/s]', 'water_dir': 'Sea water direction [degrees]', 'water_speed': 'Sea water speed [m/s]', 'wind_dir': 'Wind direction [degrees]', 'wind_speed': 'Wind speed [m/s]', 'sea_ice_u': 'Sea ice x-axis velocity [m/s]', 'sea_ice_v': 'Sea ice y-axis velocity [m/s]', 'sea_ice_area_fraction': 'Sea ice area fraction []'}