experiment

This module have all function for initiating pipeline and training

class PipeLine(pplid=None)[source]

Bases: object

khgkjv

get_path(of, pplid=None, epoch=None)[source]

Generate a standardized file path for various experiment artifacts.

Constructs and returns a file path based on the type of file (of), experiment ID, epoch number, and batch index, where applicable. Automatically creates necessary directories if they do not exist.

Parameters:
  • of (str) – The type of file to retrieve the path for. Supported values: - “config”: Configuration file path. - “weight”: Model weights file path. - “gradient”: Saved gradients file path. - “history”: Training history file path. - “quick”: Quick config file path.

  • pplid (str, optional) – Experiment ID. If not provided, uses the currently set self.pplid.

  • epoch (int, optional) – Epoch number. Required for weight and gradient file paths. For weights, if not specified, the best epoch from config is used.

Returns:

Full path to the specified artifact as a string with forward slashes.

Return type:

str

Raises:

ValueError – If pplid is not set or invalid. If required parameters (epoch, batch) are missing for gradient paths. If the of argument is not one of the supported values.

is_running()[source]

Check if the current process (identified by pplid) is currently running.

Queries the runnings table for an entry with the matching pplid.

Returns:

The logid of the running process if found, otherwise False.

Return type:

int or bool

load(pplid, prepare=False)[source]

Load the experiment configuration and optionally prepare the pipeline.

Retrieves the configuration file associated with the given experiment ID and sets it as the active configuration. Optionally, prepares the pipeline using the loaded settings (e.g., model, data loaders, etc.).

Parameters:
  • pplid (str) – The experiment ID whose configuration is to be loaded.

  • prepare (bool, optional) – Whether to immediately prepare the pipeline using the loaded configuration. Defaults to False.

Raises:
  • ValueError – If the provided experiment ID does not exist in the experiment database.

  • Side Effects

  • ------------

  • - Sets self.cnfg with the loaded configuration dictionary.

  • - Updates self.pplid to the provided experiment ID.

  • - Calls self.prepare() if prepare is True.

Return type:

None

load_model(epoch=None)[source]

Load model weights from disk into the model component for a specified epoch.

If epoch is set to ‘last’ or ‘best’, the corresponding epoch value from the experiment configuration is used. If the epoch is 0 and no weights exist yet, the model’s current state is saved before loading. Weights are loaded with strict=False to allow partial loading of the model.

Parameters:

epoch (int or str, optional) – The epoch number or keyword (‘last’ or ‘best’) indicating which weights to load. - ‘last’: Loads the most recent training checkpoint. - ‘best’: Loads the checkpoint with the best validation performance. - int: Loads the checkpoint from the specified epoch. - None: Defaults to using the current epoch from config if available.

Returns:

The model component with the loaded weights.

Return type:

torch.nn.Module

Raises:

ValueError – If the experiment configuration or weight file path is invalid or missing.

Notes

  • Uses torch.load(…, weights_only=True) for loading weights.

  • Uses strict=False in load_state_dict to allow for minor mismatches.

  • Automatically saves the model if the requested epoch is 0 and no checkpoint exists.

log()[source]

in future versions

new(pplid=None, args=None, prepare=False)[source]

Create a new experiment configuration and initialize its tracking files.

Parameters:
  • pplid (str, optional) – Unique experiment identifier. Raises ValueError if it already exists.

  • args (dict, optional) – Configuration arguments for the experiment.

  • prepare (bool, optional) – If True, calls self.prepare() after creation. Defaults to False.

Raises:
  • ValueError – If the experiment ID already exists or if monitor mode is invalid.

  • KeyError – If ‘metrics’ key is missing from settings.

  • Behavior

  • --------

  • - Checks if the experiment ID already exists; raises an error if so.

  • - Checks if the same configuration already exists using verify.

  • - Initializes configuration dictionary with metadata.

  • - Saves the configuration.

  • - Creates an empty history CSV with columns for training and validation metrics and loss.

  • - Initializes quick checkpoint file with default best and last epoch metrics.

  • - Appends experiment metadata to the main experiments CSV.

  • - Optionally calls self.prepare() if prepare=True.

Return type:

None

prepare()[source]

Prepare the experiment by loading model, optimizer, metrics, loss, and data loaders.

Loads components according to current configuration, initializes data loaders, and sets the best metric value based on the stored history and strategy.

Raises:
  • ValueError – If strategy monitor mode is not ‘min’ or ‘max’.

  • Behavior

  • --------

  • - Loads model and moves it to device.

  • - Loads optimizer with model parameters.

  • - Loads metrics and loss functions to device.

  • - Creates training and validation data loaders.

  • - Loads last saved model weights.

  • - Initializes the best metric value from saved checkpoints or sets default.

  • - Sets internal flag _prepared to True on success.

Return type:

None

reset()[source]
property should_running

Determine whether the process should continue running.

This checks the parity value for the current pplid in the runnings table. If the value is ‘stop’, the process should no longer continue.

Returns:

True if the process should keep running, False if it should stop.

Return type:

bool

stop_running()[source]

Mark the current running process to be stopped.

If the process is currently running (i.e., has an associated logid in the runnings table), this updates the parity field to ‘stop’, signaling it to stop after the current iteration. Otherwise, it prints a message indicating that the process is not running.

Returns:

None

sync()[source]

Synchronize and update the experiment configuration with the latest quick settings.

Loads the quick configuration file associated with the current experiment ID, updates the main configuration (self.cnfg) with its contents, and then saves the updated configuration to disk.

Side Effects

  • Modifies the self.cnfg attribute by merging it with the quick configuration.

  • Writes the updated configuration to the config file.

  • Prints a success message indicating the experiment has been synced.

raises ValueError:

If the current arguments do not match the original configuration, preventing saving.

Return type:

None

train(num_epochs=5, self_patience=None, verbose=None)[source]

Train the model for a specified number of epochs with optional early stopping.

Parameters:
  • num_epochs (int, optional) – Number of epochs to train. Default is 5.

  • self_patience (int, optional) – Number of epochs to wait for improvement before early stopping. If None, equals num_epochs.

  • verbose (list of str or str, optional) – Metrics to display live during training. Must be from the set of defined metrics.

Return type:

None

Notes

  • Uses early stopping based on the configured strategy and patience.

  • Automatically resumes from last epoch.

  • Saves best model weights and updates training history.

  • Avoids re-entrance if training is already running.

update(data)[source]

Update the pipeline configuration and save state after an epoch.

Parameters:

data (dict) – Dictionary containing keys such as ‘epoch’, ‘train_accuracy’, ‘train_loss’, ‘val_accuracy’, ‘val_loss’, and potentially other metrics and durations.

Returns:

Returns True if the current epoch’s validation metric improves over the best recorded, triggering a best model save; otherwise, False.

Return type:

bool

Notes

  • Saves model weights after every epoch.

  • Appends training and validation metrics to the history CSV.

  • Updates the quick checkpoint file with last and best metrics.

verify(*, pplid=None, args=None)[source]

Check whether a given experiment ID exists in the experiment database.

Queries the experiments table to verify whether the specified experiment ID is recorded.

Parameters:
  • pplid (str) – The experiment ID to check.

  • args (Dict | None)

Returns:

Returns the pplid if it exists in the database, otherwise returns False.

Return type:

Union[str, bool]

Examples

>>> pipeline.verify("exp_001")
'exp_001'
>>> pipeline.verify("nonexistent_exp")
False
archive_ppl(ppls, reverse=False)[source]

Archive or unarchive pipelines by moving their related files between active and archived folders.

Parameters:
  • ppls (List[str])

  • reverse (bool)

Return type:

None

delete_ppl(ppls)[source]

Permanently delete archived pipelines, including config files, logging files, and database records.

Parameters:

ppls (list[str]) – List of pipeline IDs to delete from archive.

Return type:

None

filter_ppls(query, ppls=None, params=False)[source]

Filters pipelines based on a query string applied to their configurations.

Parameters:
  • query (str) – A query string used to filter pipeline configurations.

  • ppls (list or None, optional) – List of pipeline IDs to filter. If None, all pipelines are considered.

  • params (bool, optional) – Whether to return parameters of matching pipelines along with their IDs.

Returns:

Filtered list of pipeline IDs or tuples of (pplid, params) if params is True.

Return type:

list

get_histories(ppls=None)[source]

Retrieve training and validation histories for specified pipelines.

Parameters:
  • (list[str] (metrics) – Defaults to all pipelines.

  • optional) (List of metric names to include from histories.) – Defaults to all pipelines.

  • (list[str] – Defaults to all available train and val metrics plus losses.

  • optional) – Defaults to all available train and val metrics plus losses.

  • ppls (List[str] | None)

Raises:

ValueError – If any pipeline ID or metric name is invalid.:

Returns:

dict[str, pd.DataFrame] – epoch and selected metrics history.

Return type:

Dictionary mapping pipeline IDs to DataFrames containing

get_matching_ppls(base_pplid, query=None, include=False)[source]

Retrieve pipelines matching a base pipeline ID and optional query.

Parameters:
  • base_pplid (str) – The base pipeline ID to compare against.

  • query (str or None, optional) – Optional query string to filter matching pipelines.

Returns:

A list of pipeline IDs matching the criteria.

Return type:

list

get_ppl_details(ppls=None)[source]

Retrieve detailed information for a list of pipelines.

Parameters:

ppls (list or None, optional) – List of pipeline IDs to fetch details for. If None, fetches details for all pipelines.

Returns:

A DataFrame containing details for each pipeline, including model, dataset, metrics, loss, and optimizer locations.

Return type:

pd.DataFrame

get_ppl_status(ppls=None)[source]

Retrieve the best and latest status metrics for specified pipelines or all if none specified.

Parameters:

ppls (list or None, optional) – List of pipeline IDs to fetch status for. If None, status for all pipelines is returned.

Returns:

DataFrame containing pipeline IDs, best epoch, train/validation metrics, losses, and last epoch.

Return type:

pd.DataFrame

get_ppls()[source]

Retrieves a list of all pipeline IDs from the database.

Returns:

A list containing all pipeline IDs.

Return type:

list of str

group_by_common_columns(records)[source]

Group pipeline records by their common set of DataFrame columns.

Parameters:
  • (dict) (records) – (e.g., training histories with various metrics).

  • records (Dict[str, DataFrame])

Returns:

dict – pipeline IDs sharing that column structure.

Return type:

A dictionary mapping each unique set of column names (as a frozenset) to a list of

Example

>>> records = {
...     "exp1": pd.DataFrame(columns=["epoch", "train_loss", "val_loss"]),
...     "exp2": pd.DataFrame(columns=["epoch", "train_loss", "val_loss"]),
...     "exp3": pd.DataFrame(columns=["epoch", "accuracy", "val_accuracy"])
... }
>>> group_by_common_columns(records)
{
    frozenset({'epoch', 'train_loss', 'val_loss'}): ['exp1', 'exp2'],
    frozenset({'epoch', 'accuracy', 'val_accuracy'}): ['exp3']
}
multi_train(ppls, last_epoch=10, patience=5)[source]

Train multiple pipelines up to a maximum number of epochs with optional patience.

Parameters:
  • ppls (dict[str, int]) – Dictionary of pipeline IDs to some integer values (usage unclear).

  • last_epoch (int, optional) – Maximum number of epochs to train each pipeline, by default 10.

  • patience (int, optional) – Number of epochs to wait for improvement before stopping (currently unused), by default 5.

Raises:

ValueError – If any pipeline ID in ppls is not found in the existing pipelines.

Return type:

None

plot_metrics(ppls=None, metrics=None, args=None)[source]

Plot specified metrics for one or more pipelines over training epochs.

Parameters:
  • ppls (list of str or None, optional) – List of pipeline IDs to plot. If None, plots all pipelines.

  • metrics (list of str or None, optional) – List of metrics to plot. If None, plots all available metrics in the histories.

  • args (Dict | None)

Returns:

Dictionary mapping metric names to their corresponding matplotlib Axes objects.

Return type:

dict

Raises:

ValueError – If any pipeline ID in ppls is invalid. If any metric in metrics is invalid.

Notes

This function groups pipelines by common metric columns and plots each metric across all pipelines sharing that metric. The returned Axes can be further customized.

transfer_ppl(ppls, transfer_type='export', mode='copy', env=True)[source]

Transfers pipeline data between main storage and transfer folder.

Parameters:
  • (list[str]) (ppls)

  • (str (mode) – ‘export’ moves data from main storage to transfer folder, ‘import’ moves data from transfer folder back to main storage.

  • optional) (Transfer mode, either 'copy' (default) or 'move'.) – ‘export’ moves data from main storage to transfer folder, ‘import’ moves data from transfer folder back to main storage.

  • (str – ‘copy’ duplicates files, ‘move’ relocates files.

  • optional) – ‘copy’ duplicates files, ‘move’ relocates files.

  • ppls (List[str])

  • transfer_type (str)

  • mode (str)

Raises:

ValueError – If transfer_type or mode is invalid,: or if any pipeline ID is not found in the source records.

Return type:

None