Storage and Retrieval
Storing
This flowchart shows the storage process. egf is a
esafile.EsagridFile instance. egf[j] and egf[k]
are esafile.EsagridFileBinGroup instances. The
bin_and_store method has been called twice before this call,
producing two previous datasets for each bin.
Data, with times and locations (all numpy arrays) are passed to the
bin_and_storemethod.Data is divided by bin and passed on to the
storemethod for each bin.The data for each bin is stored as an HDF5 Dataset in that bin’s HDF5 Group.
The average of the times passed with each Dataset is encoded as the Dataset name. Individual time values are not stored.
Retrieving
This flowchart shows part of the recall process.
Here the j-th and k-th esagrid.EsagridFileBinGroup
objects have different numbers of ‘samples’ associated with each.
Their read method returns a list of floating point times
associate with each Dataset as well as a
list of numpy arrays (the data from each dataset).
esabin.esafile
This module implements an HDF5-based data storage/retrieval scheme for binned data.
|
Class representing an HDF5 Group containing data from one bin |
|
Class representing HDF5 file which contains one Group for each (populated) bin |
- class esabin.esafile.EsagridFileBinGroup(grid, flatind)[source]
Class representing an HDF5 Group containing data from one bin
- Parameters
grid (esabin.esagrid.ConstantLatitudeSpacingGrid) – Object representing the particular grid used
flatind (int) – The index (number) defining which bin this Group is for
- class esabin.esafile.EsagridFile(hdf5_filenm, grid=None, hdf5_local_dir=None, clobber=False)[source]
Class representing HDF5 file which contains one Group for each (populated) bin
- Parameters
hdf5_filenm (str) – The filename of the hdf5 file to store to. If this is an existing file and clobber == False, will use the stored metadata in the file to create the appropriate esagrid and you can continue adding to the file or process results from it
grid (esabin.esagrid.ConstantLatitudeSpacingGrid, optional) – The grid of bins to bin into. If it is None (default), a default grid with delta_lat = 3 and n_cap_bins = 3 and azi_coord = ‘lt’ is used
hdf5_local_dir (str, optional) – A valid local path at which the hdf5 files will be created
clobber (bool, optional) – If True, will delete and overwrite the HDF5 file specified as os.path.join(hdf5_local_dir,hdf5_filenm) if it exists.
- append_existing(existing_esagrid_file)[source]
Copy all bin data from another EsagridFile instance into this one
- bin_and_store(t, lat, lonorlt, data, silent=False, additional_attrs=None)[source]
Store data into HDF5 file bin groups.
All arrays must be shape (n,) or shape (n,1) or shape (1,n).
- Parameters
t (np.ndarray) – Array of times (any float numeric representation)
lat (np.ndarray) – Array of latitudes
lonorlt (np.ndarray) – Array of longitudes or local times
data (np.ndarray) – Array of data to bin
silent (bool,optional) – Do not print status messages (default False)
additional_attrs (dict,optional) – A dictionary of additional information which will be stored as HDF5 attributes attached to any Datasets created by this function call. Keys will be used as attribute names. Attribute values will be stored as string representations of dictionary values (str(val)).
- bin_stats(statfun=<function nanmean>, statfunname=None, center_or_edges='edges', minlat=50.0, silent=False, force_recompute=False, write_to_h5=True, attr_filters=None)[source]
Apply some function to the contents of each bin
- Parameters
statfun (callable or list, optional) – The function which will be called. Can also be a list of multiple callables (in which case a dict keyed with str(callable) is returned). It is MUCH MORE EFFICIENT to do this than call bin_stats multiple times.
statfunname (str or list, optional) – List of custom keys (use if statfun is a list of callables)
center_or_edges ({'center','edges'},optional) – Return the bin center or edges
minlat (float,optional) – Minimum absolute latitude (default: 50.) below which bins are ignored.
silent (bool,optional) – Do not print status messages to stdout, default False
force_recompute (bool,optional) – If a particular statfun callable has be evaluated before, do not return a cached result. Default False.
write_to_h5 (bool, optional) – Cache the results for particular statfun callable in the HDF5. Default True. If statfunname is defined, cached result will be stored under that name (identity of callable is not checked)
attr_filters (dict, optional) – Provides optional filtering of the data fed to the callable using HDF5 attributes of the bin Datasets (additional_attrs) Key should be the HDF5 Dataset attribute name, value should be a callable which takes attribute value and returns True or False.
- Returns
binlats (np.ndarray) – Bin latitudes (center or edges)
binlonorlt (np.ndarray) – Bin longitudes or local times (center or edges)
binstats (np.ndarray or dict) – If statfun is a single callable, will return an array, the result of evaluating statfun. If statfun is a list, this will be a dict of arrays, keyed with either the string representation of each statfun or statfunnames if they are defined.