SWXSchema¶
- class swxsoc.util.schema.SWXSchema(global_schema_layers: list[str] | None = None, variable_schema_layers: list[str] | None = None, use_defaults: bool | None = True)[source]¶
Bases:
CdfAttributeManagerClass representing a schema for data requirements and formatting. The SWxSOC Default Schema only includes attributes required for ISTP compliance. Additional mission-specific attributes or requirements should be added through additional global and variable schema layers. For an example of how to layer schema files, please see the HERMES mission core package, and
HermesDataSchemaextension of theSWXSchemaclass.There are two main components to the Space Weather Data Schema, including both global and variable attribute information.
Global schema information is loaded from YAML (dict-like) files in the following format:
attribute_name: description: > Include a meaningful description of the attribute and context needed to understand its values. default: <string> # A default value for the attribute if needed/desired derived: <bool> # Whether or not the attribute's value can be derived using a python function derivation_fn: <string> # The name of a Python function to derive the value. Must be a function member of the schema class and match the signature below. required: <bool> # Whether the attribute is required overwrite: <bool> # Whether an existing value for the attribute should be overwritten if a different value is derived.
The signature for all functions to derive global attributes should follow the format below. The function takes in a parameter
datawhich is aSWXDataobject, or that of an extended data class, and returns a single attribute value for the given attribute to be derived.def derivation_fn(self, data: SWXData): # ... do manipulations as needed from `data` return "attribute_value"
Variable schema information is loaded from YAML (dict-like) files in the following format:
attribute_key: attribute_name: description: > Include a meaningful description of the attribute and context needed to understand its values. derived: <bool> # Whether or not the attribute's value can be derived using a python function derivation_fn: <string> # The name of a Python function to derive the value. Must be a function member of the schema class and match the signature below. required: <bool> # Whether the attribute is required overwrite: <bool> # Whether an existing value for the attribute should be overwritten if a different value is derived. valid_values: <list> # A list of valid values that the attribute can take. The value of the attribute is checked against the `valid_values` in the Validation module. alternate: <string> An additional attribute name that can be treated as an alternative of the given attribute. data: - attribute_name - ... support_data: - ... metadata: - ...
The signature for all functions to derive variable attributes should follow the format below. The function takes in parameters
var_name,var_data, andguess_type, where:var_nameis the variable name of the variable for which the attribute is being derivedvar_datais the variable data of the variable for which the attribute is being derivedguess_typeis the guessed CDF variable type of the data for which the attribute is being derived.
The function must return a single attribute value for the given attribute to be derived.
def derivation_fn(self, var_name: str, var_data: Union[Quantity, NDData, NDCube], guess_type: ctypes.c_long): # ... do manipulations as needed from data return "attribute_value"
- Parameters:
global_schema_layers (
Optional[list[Path]]) – Absolute file paths to global attribute schema files. These schema files are layered on top of one another in a latest-priority ordering. That is, the latest file that modifies a common schema attribute will take precedence over earlier values for a given attribute.variable_schema_layers (
Optional[list[Path]]) – Absolute file paths to variable attribute schema files. These schema files are layered on top of one another in a latest-priority ordering. That is, the latest file that modifies a common schema attribute will take precedence over earlier values for a given attribute.use_defaults (
Optional[bool]) – Whether or not to load the default global and variable attribute schema files. These default schema files contain only the requirements for CDF ISTP validation.
Attributes Summary
Function to load the default global attributes from the SWxSOC schema.
Methods Summary
derive_global_attributes(data)Function to derive global attributes for the given measurement data.
derive_measurement_attributes(data, var_name)Function to derive metadata for the given measurement.
global_attribute_info([attribute_name])Function to generate a
astropy.table.Tableof information about each global metadata attribute.Function to generate a template of required global attributes that must be set for a valid CDF.
measurement_attribute_info([attribute_name])Function to generate a
astropy.table.Tableof information about each variable metadata attribute.Function to generate a template of required measurement attributes that must be set for a valid CDF measurement variable.
types(data[, encoding])Find dimensions and valid types of a nested list-of-lists
Attributes Documentation
- default_global_attributes¶
Function to load the default global attributes from the SWxSOC schema.
- Returns:
default_global_attributes (
dict) – A dictionary of default global attributes.
Methods Documentation
- derive_global_attributes(data) OrderedDict[source]¶
Function to derive global attributes for the given measurement data.
- Parameters:
data (
swxsoc.swxdata.SWXData) – An instance ofSWXDatato derive metadata from.- Returns:
attributes (
OrderedDict) – A dict containingkey: valuepairs of global metadata attributes.
- derive_measurement_attributes(data, var_name: str, guess_types: list[int] | None = None) OrderedDict[source]¶
Function to derive metadata for the given measurement.
- Parameters:
data (
swxsoc.swxdata.SWXData) – An instance ofSWXDatato derive metadata fromvar_name (
str) – The name of the measurement to derive metadata forguess_types (
list[int], optional) – Guessed CDF Type of the variable
- Returns:
attributes (
OrderedDict) – A dict containingkey: valuepairs of derived metadata attributes.
- global_attribute_info(attribute_name: str | None = None) Table[source]¶
Function to generate a
astropy.table.Tableof information about each global metadata attribute. Theastropy.table.Tablecontains all information in the SWxSOC global attribute schema including:description: (
str) A brief description of the attributedefault: (
str) The default value used if none is providedrequired: (
bool) Whether the attribute is required by SWxSOC standards
- Parameters:
attribute_name (
str, optional, default None) – The name of the attribute to get specific information for.- Returns:
info (
astropy.table.Table) – A table of information about global metadata.- Raises:
KeyError – If attribute_name is not a recognized global attribute.:
- global_attribute_template() OrderedDict[source]¶
Function to generate a template of required global attributes that must be set for a valid CDF.
- Returns:
template (
OrderedDict) – A template for required global attributes that must be provided.
- measurement_attribute_info(attribute_name: str | None = None) Table[source]¶
Function to generate a
astropy.table.Tableof information about each variable metadata attribute. Theastropy.table.Tablecontains all information in the SWxSOC variable attribute schema including:description: (
str) A brief description of the attributerequired: (
bool) Whether the attribute is required by SWxSOC standards- valid_values: (
str) List of allowed values the attribute can take for SWxSOC products, if applicable
- valid_values: (
- alternate: (
str) An additional attribute name that can be treated as an alternative of the given attribute. Not all attributes have an alternative and only one of a given attribute or its alternate are required.
- alternate: (
- var_types: (
str) A list of the variable types that require the given attribute to be present.
- var_types: (
- Parameters:
attribute_name (
str, optional, default None) – The name of the attribute to get specific information for.- Returns:
info (
astropy.table.Table) – A table of information about variable metadata.- Raises:
KeyError – If attribute_name is not a recognized global attribute.:
- measurement_attribute_template() OrderedDict[source]¶
Function to generate a template of required measurement attributes that must be set for a valid CDF measurement variable.
- Returns:
template (
OrderedDict) – A template for required variable attributes that must be provided.
- types(data, encoding='utf-8')[source]¶
Find dimensions and valid types of a nested list-of-lists
Any given data may be representable by a range of CDF types; infer the CDF types which can represent this data. This breaks down to:
Proper kind (numerical, string, time).
Proper range (stores highest and lowest number).
Sufficient resolution (EPOCH16 or TT2000 required if astropy.time has microseconds or below).
When more than one type satisfies the requirements, candidates are returned in preference order:
Type that matches the precision of the data first,
Integer type before float type,
Smallest type first,
Signed type first,
Specifically-named (
CDF_BYTE) before generically-named (CDF_INT1).
CDF_TIME_TT2000is always preferred forTimeinputs since SWxSOC 0.3.0.For floats, four-byte is preferred unless eight-byte is required:
Absolute values between 0 and
3e-39.Absolute values greater than
1.7e38.
This will switch to an eight-byte double in some cases where four bytes would be sufficient for IEEE 754 encoding, but where DEC formats would require eight.
- Parameters:
data (array-like, scalar, str, or
Time) – The data for which dimensions and CDF types are desired. May be a nested list-of-lists, anumpy.ndarray, a Python scalar, a string, or anTimeinstance.encoding (
str, optional) – Encoding to use for Unicode (U) input when computing the on-disk element length. Defaults to"utf-8".
- Returns:
dims (
tupleofint) – Dimensions ofdata, in order outside-in.types (
listofint) – CDF type numbers (seeswxsoc.util.const) which can representdata, in preferred order. The first entry is the type thatCDFHandleruses on write.elements (
int) – Number of elements required per record (i.e. length of the longest string forCDF_CHAR/CDF_UCHARvariables;1otherwise).
- Raises:
ValueError – If
datahas irregular dimensions, is an empty object array, or contains generic Python objects that cannot be converted to a CDF type.
Notes
The algorithm is adapted from
spacepy.pycdf.istp.VarBundle._types(). See the CDF Format Guide (Section 5, Data Type Mapping) for a full user-facing description of the NumPydtype→ CDF type rules.Examples
>>> import numpy as np >>> from swxsoc.util.schema import SWXSchema >>> schema = SWXSchema() >>> dims, types, elements = schema.types(np.array([1, 2, 3], dtype=np.int32)) >>> dims, types[0], elements ((3,), 4, 1)