User Workflow
This tutorial explains how to initialize a machine learning project using the
PyTorchLabFlow package by providing a configuration dictionary and calling a single function.
Prerequisites
Make sure the PyTorchLabFlow package is installed and accessible in your Python environment:
pip install PyTorchLabFlow # Or use your own installation method
Step 1: Define Your Project Configuration
Create a dctionary with same given key and respective values.
settings = {
"project_name": "AdultIncomePrediction",
"project_dir": "path/to/your/data/folder",
"component_dir": "path/to/your/component/folder",
"setting_path": "path/to/the/project/config/file.json",
"metrics": ["accuracy", "auroc", "f1score", "auprc"],
"strategy": {"monitor": "val_loss", "mode": "min"},
}
keys and the values are explained below.
Explanation of Settings
Here’s a breakdown of the keys in the settings dictionary:
project_name: A name for your project; used for display/logging purposes.project_dir: Directory where input data and processed outputs will be stored.component_dir: Folder to store your model components like network, loss function, optimizer, etc.setting_path: Full path where the settings configuration (as JSON) will be saved.metrics: A list of metric names you plan to evaluate (e.g.,accuracy,f1score, etc.). These should match implemented metric components or be defined later.strategy: Defines your model selection strategy (e.g., monitorval_losswithminto minimize validation loss).defaults: A dictionary of default values or placeholders that will be filled later:metrics: Initially set toNonefor each; these can later point to custom implementations.loss: The loss function component (e.g.,CrossEntropyLoss).optimizer: Your optimizer name or object (e.g.,Adam).train_data_src: Path to training data file.valid_data_src: Path to validation data file.train_batch_size,valid_batch_size: Batch sizes for training and validation phases.
You can leave most values as None initially and fill them in after the project structure is generated.
Step 2: Run the Project Creation Function
Use the create_project function from PyTorchLabFlow.lab to initialize the project:
from PTLF.lab import create_project
create_project(settings=settings)
What Happens Next?
Running the script performs the following:
Creates the specified
data_pathandcomponent_dirfolders if they don’t exist.Saves the
settingsdictionary as a JSON file atsetting_path.Initializes a basic folder structure for your ML project.
Resulting Directory Structure
path/to/your/data/folder/
│
├── Configs/ # Stores experiment configuration files
├── Weights/ # Stores trained model weights
├── Quicks/ # For quick experiments or debugging artifacts
├── Histories/ # Training history logs (loss/accuracy curves etc.)
│
├── Archived/ # Stores older/archived experiment artifacts
│ ├── Configs/
│ ├── Weights/
│ └── Histories/
│
├── Transfer/ # For storing artifacts ready for deployment or sharing
│ ├── Configs/
│ ├── Weights/
│ └── Histories/
│
├── ppls.db # SQLite DB for tracking experiment metadata
└── settings.json # Serialized configuration file used for setup
path/to/your/component/folder/
└── CompBase/ # Base implementations of components
├── __init__.py
├── models.py # Model architectures (e.g., Meso4)
├── datasets.py # Dataset handling and transformations
├── metrics.py # Evaluation metrics (e.g., Accuracy, AUROC)
├── losses.py # Loss functions
└── optimizers.py # Optimizers (e.g., Adam)
Using the Project Later
Once your project has been set up, you can load the full configuration and prepare the environment anytime using:
from PTLF.lab import lab_setup
lab_setup(settings_path="path/to/the/project/config/file.json")
This sets up the internal context, links components, and restores all paths, making it easy to continue working in Jupyter notebooks, scripts, or any Python environment.
Important
In any Jupyter notebook or Python script, simply call lab_setup at the top and you’re ready to start working with the full project structure.
You’re now ready to start building models, managing experiments, and scaling your ML workflow using the PyTorchLabFlow environment.
Building and Running Deep Learning Pipelines
Step 1: Design your components
Design your Dataset class inheriting PTLF.utils.DataSet
Design your model inheriting PTLF.utils.Model
Make other components like Loss, Optimizer, and one or more metrics what ever you decided while initiating a project
Tip
See the full design notebook: Design
Step 2: Define Experiment Configuration
Create a nested dictionary specifying all pipeline components such as model, dataset, optimizer, loss, metrics, and data sources.
expargs = {
"dataset": {
"loc": "CompBase.datasets.DS01",
"args": {}
},
"model": {
"loc": "CompBase.models.SimpleNN",
"args": {
"h1_dim": 120,
"h2_dim": 1000,
"drop": 0.3
}
},
"loss": {
"loc": "CompBase.losses.BCElogit",
"args": {}
},
"optimizer": {
"loc": "CompBase.optimizers.OptAdam",
"args": {}
},
"metrics": {
"accuracy": {
"loc": "CompBase.metrics.BinAcc",
"args": {}
},
"auroc": {
"loc": "CompBase.metrics.AUROC",
"args": {}
}
},
"train_data_src": "path/to/train.csv",
"val_data_src": "path/to/valid.csv",
"train_batch_size": 36,
"val_batch_size": 36
}
Step 3: Create a New Experiment
P = PipeLine() #Initialize the Pipeline
P.match_args(expargs) #(Optional) Match Existing Experiments
# Returns existing experiment ID or False if new
P.new(args=expargs.copy(), expid="exp2")
Step 4: Start Training
P.train(num_epochs=10)
Supports features like early stopping, verbose logging, and hooks.
Extra Utilities
Function |
Description |
|---|---|
|
Load existing experiment configuration. |
|
Prepare model, data, and metrics manually. |
|
Load model weights from a specific or best epoch. |
|
Log metrics after an epoch (usually called automatically). |
|
Create a new experiment based on an existing one. |
Step 5: Plot Comparative Performances
from PTLF.experiment import plot_metrics
Vs = plot_metrics(ppls=[...], metrics=['train_loss', "train_accuracy"])
Vs["train_accuracy"]
You can access previously initiated pipeline just by their pipeline_id(pplid)
P = PipeLine(pplid='exp2')
and then can acces artifacts etc see PipeLine.get_path but make sure you coonected to correct lab configuration at the top of the jupyter using lab_setup this way you can organinze all your trials, hypotheses in fixed number of dedicated jupyter files.
config matching
Because all the pipeline configs are nested dictionaries, we can easyli search for ppls that shares components/parameters. Using filter_ppls and get_matching_ppls.
query: is a string format that helps to find configs that shaes a component(dictionary format)
because all ppl-config have args key and that args have different compenents
query='args>model=my_project.models.TransformerClassifier>backbone=my_project.models.TransformerEncoder'
filter all the ppl-configs that have Example: Nesting Components
where > means nesting