🤔 A General Introduction of ML/DL Project Management and Knowledge Management - Project Management

# Management # Deep Learing # Coding
Published On: July 28, 2023 (Last updated on: April 15, 2024)
781 words · 2 min

There are multiple articles have illustruated that the project(code) management is important:

In this guideline, I would mainly shows you how I manage my project’s code and link the new knowledge to the knowledge base.

In General

A lifecyle of a project may includes this five points¹:

Planning and project setup
- Define the task and scope out requirements
- Determine project feasibility
- Discuss general model tradeoffs (accuracy vs speed)
- Set up project codebase
Data collection and labelling
- Define ground truth (create labeling documentation)
- Build data ingestion pipeline
- Validate quality of data
- Label data and ensure ground truth is well-definend
- Revisit Step 1 and ensure data is sufficient for the task
Model exploration
- Establish baselines for model performance
- Start with a simple model using initial data pipeline
- Overfit simple model to training data
- Stay nimble and try many parallel (isolated) ideas during early stages
- Find SoTA model for your problem domain (if available) and reproduce results, then apply to your dataset as a second baseline
- Revisit Step 1 and ensure feasibility
- Revisit Step 2 and ensure data quality is sufficient
Model refinement
- Perform model-specific optimizations (ie. hyperparameter tuning)
- Iteratively debug model as complexity is added
- Perform error analysis to uncover common failure modes
- Revisit Step 2 for targeted data collection and labeling of observed failure modes
Testing and evaluation
- Evaluate model on test distribution; understand differences between train and test set distributions (how is “data in the wild” different than what you trained on)
- Revisit model evaluation metric; ensure that this metric drives desirable downstream user behavior
- Write tests for:
  - Input data pipeline
  - Model inference functionality
  - Model inference performance on validation data
  - Explicit scenarios expected in production (model is evaluated on a curated set of observations)

For each part of the project, there are multiple different tools we may use:

Planning and project step, in specific, it could be treated as the project tracking, thus, all the tracking tools can be used in the step, e.g., Confluence, YouTrack, Jira, Trello, etc..
Data collection and labelling: If we are using the public dataset, we may ignore the labelling problem; however, if we are collecting and labelling our own dataset, we may consider these awesome-labelling-tools; the next question for the data storage and collection is the versioning problem, we may consider these tools:
1. Neptune
2. WandB
3. DVC
4. LakeFS
5. Git LFS
Log tracing
1. We can use loggings or tensorboard to store the log to the local directory
2. We can also use wandb, neptune and other tools to store the log to both the local directory or the cloud

Project Specification

The project specification includes two aspects:

project files specification
git branche specification

Project files Specification

Here is the architecuture of a project²

Files/Catalogue	Detail	Required
README.md	Instruction of the project and the folder architecture	YES
train.py	Model Traning and Validation	YES
test.py	Model Testing	YES
src/{model-name}.py	Statement and Source code of the Model (For single model)	YES
src/{model-name}.py	Statement and Source code of the Model (For multiple models)	YES
src/{modules-name}/{model-name}}.py	Statement and Source code of the Model (For multiple submodules & single model)	YES
src/{modules-name}/{model-name}.py	Statement and Source code of the Model (For multiple submodules & multiple models)	YES
utils/loss.py	Implementation of loss function	NO
utils/base.py	utilities file	NO
data/data.py	Dataset and DataLoader Implemenetation	NO
data/train.txt	manifest file of training set	NO
data/validate.txt	manifest of validation set	NO
data/test.txt	manifest of testing set	NO
experiment/{exp-name}/params.yaml	configuration of the experiment	YES
experiment/{exp-name}/log/	folder to store the log file produced by the experiment	YES
experiment/{exp-name}/model/	folder to store the binary file of the trained models	YES
experiment/{exp-name}/result/	folder to store the model output (images, csv, etc.)	YES

Git Branch Specification

Every projects may face a problem: how to perform multiple experiments in parallel? We adopt the solution of git branch:

Each experiment has a corresponding git branch
Three kinds of branches:
1. Temporary branches: start with TEMP_
2. Long-term branches: start with MAIN_
3. Project Introduction/Demo branch: main: include the basic information of the project in the branch
The branches’ name should includes:
1. Main purpose of the branch, i.e. model’s name
2. Dataset it use: i.e., dataset’s name
3. Others: whether the experiment include augmentation or other configuration

Here are two examples:

TEMP_PIX2PIX_DRIVE_NOAUG represents a temporary branch with pix2pix model using DRIVE dataset with no augmentation in data preprocessing
MAIN_PIX2PIX_MESSIDOR_AUG represents a long-term branch with pix2pix model using MESSIDOR dataset with augmentation in data proprocessing

There are some extra steps, you may check them in Jeremey Jordan’s Blog ↩︎
For more detail, please check CKLAU’s GitLab - Specification ↩︎

← 🧑🏿‍💻 Multimodal Representation Leraning from both Text and Image Homelab: A self-hosted GitHub Accelerator →

In General

Project Specification

Project files Specification

Git Branch Specification

Related Resources