Environment Setup
Before running any of the pipeline stages, make sure the local environment matches the assumptions in the repository documentation and Python scripts.
Required Tooling
The workflow assumes the following tools are available:
- Conda or Miniconda for environment management
- Python with the dependencies defined by
environment.yml - A Hugging Face account with access to the base model
- A CUDA-capable GPU if you want to run the training and inference stages as written
The fine-tuning and inference scripts are clearly optimized for modern NVIDIA hardware, especially when BF16 and Flash Attention are available.
Expected Setup Flow
The documented setup sequence is:
conda env update --file environment.yml --prune
conda activate Mistral-FineTuning-Lab
huggingface-cli login
The first command creates or updates the Python environment. The second activates it. The third authenticates the machine against Hugging Face so the base model can be downloaded.
Configuration Files The Pipeline Expects
Two repository-level files are referenced throughout the workflow:
environment.ymlconfig.ini
In this documentation workspace snapshot, neither file is present locally. In the public project repository, both files are available.
That matters because:
environment.ymldefines the Python environment needed by the scriptsconfig.iniprovides dataset paths, tokenizer settings, model selection, and fine-tuning outputs
Without them in this workspace, the documentation remains useful as a code walk-through, but a full end-to-end run should start from the upstream repository where those configuration files already exist.
Start From The Public Repository
If you want to reproduce the pipeline rather than just read it, use the public repository as your working copy:
Start From The Public Repository
That keeps the docs focused on explanation while GitHub remains the place where you fetch the runnable project files.
- Repository root Clone this repository before running the lab.
- README High-level setup and stage order.
- environment.yml Conda environment definition.
- config.ini Project-specific paths and training settings.
Hugging Face Authentication
The base model download requires authentication. The expected flow is:
- Create a Hugging Face account if you do not already have one.
- Generate an access token.
- Run
huggingface-cli loginand paste the token when prompted.
If the login step is skipped, the tokenizer, fine-tuning, or inference stages may fail during model loading.
Practical Hardware Expectations
The codebase is designed around a resource-constrained but capable single-machine setup:
- 4-bit loading through
bitsandbytes - BF16 computation when supported
- LoRA adapters instead of full fine-tuning
- Flash Attention 2 when the package is installed and the GPU supports it
In other words, this is not a distributed training workflow. It is a focused local pipeline aimed at making a 7B model trainable on prosumer hardware.
Recommended Reading After Setup
Once the environment assumptions are clear, continue in this order: