SWXSchema

class swxsoc.util.schema.SWXSchema(global_schema_layers: list[str] | None = None, variable_schema_layers: list[str] | None = None, use_defaults: bool | None = True)[source]

Bases: CdfAttributeManager

Class representing a schema for data requirements and formatting. The SWxSOC Default Schema only includes attributes required for ISTP compliance. Additional mission-specific attributes or requirements should be added through additional global and variable schema layers. For an example of how to layer schema files, please see the HERMES mission core package, and HermesDataSchema extension of the SWXSchema class.

There are two main components to the Space Weather Data Schema, including both global and variable attribute information.

Global schema information is loaded from YAML (dict-like) files in the following format:

attribute_name:
    description: >
        Include a meaningful description of the attribute and context needed to understand
        its values.
    default: <string> # A default value for the attribute if needed/desired
    derived: <bool> # Whether or not the attribute's value can be derived using a python function
    derivation_fn: <string> # The name of a Python function to derive the value. Must be a function member of the schema class and match the signature below.
    required: <bool> # Whether the attribute is required
    overwrite: <bool> # Whether an existing value for the attribute should be overwritten if a different value is derived.

The signature for all functions to derive global attributes should follow the format below. The function takes in a parameter data which is a SWXData object, or that of an extended data class, and returns a single attribute value for the given attribute to be derived.

def derivation_fn(self, data: SWXData):
    # ... do manipulations as needed from `data`
    return "attribute_value"

Variable schema information is loaded from YAML (dict-like) files in the following format:

attribute_key:
    attribute_name:
        description: >
            Include a meaningful description of the attribute and context needed to understand
            its values.
        derived: <bool> # Whether or not the attribute's value can be derived using a python function
        derivation_fn: <string> # The name of a Python function to derive the value. Must be a function member of the schema class and match the signature below.
        required: <bool> # Whether the attribute is required
        overwrite: <bool> # Whether an existing value for the attribute should be overwritten if a different value is derived.
        valid_values: <list> # A list of valid values that the attribute can take. The value of the attribute is checked against the `valid_values` in the Validation module.
        alternate: <string> An additional attribute name that can be treated as an alternative of the given attribute.
data:
    - attribute_name
    - ...
support_data:
    - ...
metadata:
    - ...

The signature for all functions to derive variable attributes should follow the format below. The function takes in parameters var_name, var_data, and guess_type, where:

  • var_name is the variable name of the variable for which the attribute is being derived

  • var_data is the variable data of the variable for which the attribute is being derived

  • guess_type is the guessed CDF variable type of the data for which the attribute is being derived.

The function must return a single attribute value for the given attribute to be derived.

def derivation_fn(self, var_name: str, var_data: Union[Quantity, NDData, NDCube], guess_type: ctypes.c_long):
    # ... do manipulations as needed from data
    return "attribute_value"
Parameters:
  • global_schema_layers (Optional[list[Path]]) – Absolute file paths to global attribute schema files. These schema files are layered on top of one another in a latest-priority ordering. That is, the latest file that modifies a common schema attribute will take precedence over earlier values for a given attribute.

  • variable_schema_layers (Optional[list[Path]]) – Absolute file paths to variable attribute schema files. These schema files are layered on top of one another in a latest-priority ordering. That is, the latest file that modifies a common schema attribute will take precedence over earlier values for a given attribute.

  • use_defaults (Optional[bool]) – Whether or not to load the default global and variable attribute schema files. These default schema files contain only the requirements for CDF ISTP validation.

Attributes Summary

default_global_attributes

Function to load the default global attributes from the SWxSOC schema.

Methods Summary

derive_global_attributes(data)

Function to derive global attributes for the given measurement data.

derive_measurement_attributes(data, var_name)

Function to derive metadata for the given measurement.

global_attribute_info([attribute_name])

Function to generate a astropy.table.Table of information about each global metadata attribute.

global_attribute_template()

Function to generate a template of required global attributes that must be set for a valid CDF.

measurement_attribute_info([attribute_name])

Function to generate a astropy.table.Table of information about each variable metadata attribute.

measurement_attribute_template()

Function to generate a template of required measurement attributes that must be set for a valid CDF measurement variable.

types(data[, encoding])

Find dimensions and valid types of a nested list-of-lists

Attributes Documentation

default_global_attributes

Function to load the default global attributes from the SWxSOC schema.

Returns:

default_global_attributes (dict) – A dictionary of default global attributes.

Methods Documentation

derive_global_attributes(data) OrderedDict[source]

Function to derive global attributes for the given measurement data.

Parameters:

data (swxsoc.swxdata.SWXData) – An instance of SWXData to derive metadata from.

Returns:

attributes (OrderedDict) – A dict containing key: value pairs of global metadata attributes.

derive_measurement_attributes(data, var_name: str, guess_types: list[int] | None = None) OrderedDict[source]

Function to derive metadata for the given measurement.

Parameters:
  • data (swxsoc.swxdata.SWXData) – An instance of SWXData to derive metadata from

  • var_name (str) – The name of the measurement to derive metadata for

  • guess_types (list[int], optional) – Guessed CDF Type of the variable

Returns:

attributes (OrderedDict) – A dict containing key: value pairs of derived metadata attributes.

global_attribute_info(attribute_name: str | None = None) Table[source]

Function to generate a astropy.table.Table of information about each global metadata attribute. The astropy.table.Table contains all information in the SWxSOC global attribute schema including:

  • description: (str) A brief description of the attribute

  • default: (str) The default value used if none is provided

  • derived: (bool) Whether the attibute can be derived by the SWxSOC

    SWXSchema class

  • required: (bool) Whether the attribute is required by SWxSOC standards

  • overwrite: (bool) Whether the SWXSchema

    attribute derivations will overwrite an existing attribute value with an updated attribute value from the derivation process.

Parameters:

attribute_name (str, optional, default None) – The name of the attribute to get specific information for.

Returns:

info (astropy.table.Table) – A table of information about global metadata.

Raises:

KeyError – If attribute_name is not a recognized global attribute.:

global_attribute_template() OrderedDict[source]

Function to generate a template of required global attributes that must be set for a valid CDF.

Returns:

template (OrderedDict) – A template for required global attributes that must be provided.

measurement_attribute_info(attribute_name: str | None = None) Table[source]

Function to generate a astropy.table.Table of information about each variable metadata attribute. The astropy.table.Table contains all information in the SWxSOC variable attribute schema including:

  • description: (str) A brief description of the attribute

  • derived: (bool) Whether the attibute can be derived by the SWxSOC

    SWXSchema class

  • required: (bool) Whether the attribute is required by SWxSOC standards

  • overwrite: (bool) Whether the SWXSchema

    attribute derivations will overwrite an existing attribute value with an updated attribute value from the derivation process.

  • valid_values: (str) List of allowed values the attribute can take for SWxSOC products,

    if applicable

  • alternate: (str) An additional attribute name that can be treated as an alternative

    of the given attribute. Not all attributes have an alternative and only one of a given attribute or its alternate are required.

  • var_types: (str) A list of the variable types that require the given

    attribute to be present.

Parameters:

attribute_name (str, optional, default None) – The name of the attribute to get specific information for.

Returns:

info (astropy.table.Table) – A table of information about variable metadata.

Raises:

KeyError – If attribute_name is not a recognized global attribute.:

measurement_attribute_template() OrderedDict[source]

Function to generate a template of required measurement attributes that must be set for a valid CDF measurement variable.

Returns:

template (OrderedDict) – A template for required variable attributes that must be provided.

types(data, encoding='utf-8')[source]

Find dimensions and valid types of a nested list-of-lists

Any given data may be representable by a range of CDF types; infer the CDF types which can represent this data. This breaks down to:

  1. Proper kind (numerical, string, time).

  2. Proper range (stores highest and lowest number).

  3. Sufficient resolution (EPOCH16 or TT2000 required if astropy.time has microseconds or below).

When more than one type satisfies the requirements, candidates are returned in preference order:

  1. Type that matches the precision of the data first,

  2. Integer type before float type,

  3. Smallest type first,

  4. Signed type first,

  5. Specifically-named (CDF_BYTE) before generically-named (CDF_INT1).

CDF_TIME_TT2000 is always preferred for Time inputs since SWxSOC 0.3.0.

For floats, four-byte is preferred unless eight-byte is required:

  1. Absolute values between 0 and 3e-39.

  2. Absolute values greater than 1.7e38.

This will switch to an eight-byte double in some cases where four bytes would be sufficient for IEEE 754 encoding, but where DEC formats would require eight.

Parameters:
  • data (array-like, scalar, str, or Time) – The data for which dimensions and CDF types are desired. May be a nested list-of-lists, a numpy.ndarray, a Python scalar, a string, or an Time instance.

  • encoding (str, optional) – Encoding to use for Unicode (U) input when computing the on-disk element length. Defaults to "utf-8".

Returns:

  • dims (tuple of int) – Dimensions of data, in order outside-in.

  • types (list of int) – CDF type numbers (see swxsoc.util.const) which can represent data, in preferred order. The first entry is the type that CDFHandler uses on write.

  • elements (int) – Number of elements required per record (i.e. length of the longest string for CDF_CHAR / CDF_UCHAR variables; 1 otherwise).

Raises:

ValueError – If data has irregular dimensions, is an empty object array, or contains generic Python objects that cannot be converted to a CDF type.

Notes

The algorithm is adapted from spacepy.pycdf.istp.VarBundle._types(). See the CDF Format Guide (Section 5, Data Type Mapping) for a full user-facing description of the NumPy dtype → CDF type rules.

Examples

>>> import numpy as np
>>> from swxsoc.util.schema import SWXSchema
>>> schema = SWXSchema()
>>> dims, types, elements = schema.types(np.array([1, 2, 3], dtype=np.int32))
>>> dims, types[0], elements
((3,), 4, 1)