datazen.classes package

datazen.classes package#

Submodules#

datazen.classes.data_repository module#

datazen - An interface for managing on-disk data by loading it and making: discrete changes.

class datazen.classes.data_repository.DataRepository(root_dir: str, out_type: str = 'yaml', logger: ~logging.Logger = <Logger datazen.classes.data_repository (WARNING)>)[source]#

Bases: object

A class for interacting with file-backed databases that are built with serialization formats supported by this package.

loaded(root_rel: str = '.', write_back: bool = True) → Iterator[Dict[str, Any]][source]#: Load a data directory, yield it to the caller and write it back when complete inside a locked context for this repository.

meld(data: Dict[str, Any], root_rel: str = '.', expect_overwrite: bool = False) → Iterator[Repo][source]#: Update the data at some root-relative path, write it to disk and then provided the repository object still within a locked context.

datazen.classes.file_info_cache module#

datazen - A class for storing metadata about files that have been loaded.

class datazen.classes.file_info_cache.FileInfoCache(cache_dir: str = None, logger: ~logging.Logger = <Logger datazen.classes.file_info_cache (WARNING)>)[source]#

Bases: object

Provides storage for file hashes and lists that have been loaded.

check_hit(sub_dir: str, path: str, also_cache: bool = True) → bool[source]#: Determine if a given file already exists with its current hash in the cache, if not return False and optionally add it to the cache.

clean() → None[source]#: Remove cached data from the file-system.

describe() → None[source]#: Describe this cache’s contents for debugging purposes.

get_data(name: str) → LoadedFiles[source]#: Get the tuple version of cached data.

get_hashes(sub_dir: str) → Dict[str, Any][source]#: Get the cached, dictionary of file hashes for a certain key.

get_loaded(sub_dir: str) → List[str][source]#: Get the cached, list of loaded files for a certain key.

load(cache_dir: str) → None[source]#: Load data from a directory.

write(out_type: str = 'json') → None[source]#: Commit cached data to the file-system.

datazen.classes.file_info_cache.cmp_loaded_count(cache_a: FileInfoCache, cache_b: FileInfoCache, name: str) → int[source]#: Compute the total difference in file counts (for a named group) between two caches.

datazen.classes.file_info_cache.cmp_loaded_count_from_set(cache_a: FileInfoCache, cache_b: FileInfoCache, name: str, files: List[str]) → int[source]#: Count the number of files uniquely loaded to one cache but not the other.

datazen.classes.file_info_cache.cmp_total_loaded(cache_a: ~datazen.classes.file_info_cache.FileInfoCache, cache_b: ~datazen.classes.file_info_cache.FileInfoCache, known_types: ~typing.List[str], load_checks: ~typing.Dict[str, ~typing.List[str]] = None, logger: ~logging.Logger = <Logger datazen.classes.file_info_cache (WARNING)>) → int[source]#: Compute the total difference in file counts for a provided set of named groups.

datazen.classes.file_info_cache.copy(cache: FileInfoCache) → FileInfoCache[source]#: Copy one cache into a new one.

datazen.classes.file_info_cache.meld(cache_a: FileInfoCache, cache_b: FileInfoCache) → None[source]#: Promote all updates from cache_b into cache_a.

datazen.classes.file_info_cache.remove_missing_hashed_files(data: Dict[str, Any], removed_data: Dict[str, List[str]]) → Dict[str, Any][source]#: Assign new hash data based on the files that are still present.

datazen.classes.file_info_cache.remove_missing_loaded_files(data: Dict[str, Any]) → Dict[str, Any][source]#: Audit list elements in a dictionary recursively, assume the data is String and the elements are filenames, assign a new list for all of the elements that can be located.

datazen.classes.file_info_cache.sync_cache_data(cache_data: Dict[str, Any], removed_data: Dict[str, List[str]]) → Dict[str, Any][source]#: Before writing a cache to disk we want to de-duplicate items in the loaded list and remove hash data for files that were removed so that if they come back at the same hash, it’s not considered already loaded.

datazen.classes.file_info_cache.time_str(time_s: float) → str[source]#: Concert a timestamp to a String.

datazen.classes.target_resolver module#

datazen - Orchestrates the “parameterized target” capability.

class datazen.classes.target_resolver.TargetResolver(logger: ~logging.Logger = <Logger datazen.classes.target_resolver (WARNING)>)[source]#

Bases: object

A class for managing resolution of literal and templated target definitions.

clear() → None[source]#: Re-initialize target dataset, in case a new manifest is reloaded.

get_target(group: str, name: str) → Dict[str, Any] | None[source]#: Attempt to get a literal target from what has been loaded so far.

register_group(name: str, targets: List[Dict[str, Any]]) → None[source]#: From a name of a group that contains targets, initialize it by also providing its target datset.

datazen.classes.task_data_cache module#

datazen - A class for storing data from completed operations to disk.

class datazen.classes.task_data_cache.TaskDataCache(cache_dir: str)[source]#

Bases: object

Provides storage for data produced by task targets to facilitate better (and more correct) short-circuiting.

clean(purge_data: bool = True) → None[source]#: Clean this cache’s data on disk.

load(load_dir: str) → None[source]#: Read new data from the cache directory and update state.

save(out_type: str = 'json') → None[source]#: Write cache data to disk.

datazen.classes.valid_dict module#

datazen - A dict wrapper that enables simpler schema validation.

class datazen.classes.valid_dict.ValidDict(name: str, data: ~typing.Dict[str, ~typing.Any], schema: ~vcorelib.schemas.base.Schema, logger: ~logging.Logger = <Logger datazen.classes.valid_dict (WARNING)>)[source]#

Bases: UserDict

An object that behaves like a dictionary but can have a provided schema enforced.

datazen.classes package

Contents

datazen.classes package#

Submodules#

datazen.classes.data_repository module#

datazen.classes.file_info_cache module#

datazen.classes.target_resolver module#

datazen.classes.task_data_cache module#

datazen.classes.valid_dict module#

Module contents#