Environment Setup

Before running anything, make sure your local machine matches what the repo and scripts expect.

Required Tooling

The workflow assumes you have:

  • Conda or Miniconda for environment management
  • Python with the dependencies defined by environment.yml
  • A Hugging Face account with access to the base model
  • A CUDA-capable GPU if you want to run the training and inference stages as written

The training and inference stages are written for modern NVIDIA hardware, especially if BF16 and Flash Attention are available.

Expected Setup Flow

The documented setup sequence is:

conda env update --file environment.yml --prune
conda activate Mistral-FineTuning-Lab
huggingface-cli login

The first command builds or updates the environment. The second activates it. The third makes sure the machine can download the base model from Hugging Face.

Configuration Files The Pipeline Expects

Two repository-level files are referenced throughout the workflow:

  • environment.yml
  • config.ini

They are not present in this docs workspace snapshot, but they do exist in the public repository. That matters because:

  • environment.yml defines the Python environment needed by the scripts
  • config.ini provides dataset paths, tokenizer settings, model selection, and fine-tuning outputs

So the docs here are still useful as a walkthrough, but if you want to run the lab for real, start from the upstream repo instead of trying to reconstruct those files by hand.

Start From The Public Repository

If you want to reproduce the pipeline rather than just read it, use the public repository as your working copy:

Start From The Public Repository

That keeps the docs focused on explanation while GitHub remains the place where you fetch the runnable project files.

Hugging Face Authentication

The base model download requires authentication. The expected flow is:

  1. Create a Hugging Face account if you do not already have one.
  2. Generate an access token.
  3. Run huggingface-cli login and paste the token when prompted.

If you skip the login, model loading is the part most likely to fail first.

Practical Hardware Expectations

The whole project is built around a capable single-machine setup, not a distributed training stack:

  • 4-bit loading through bitsandbytes
  • BF16 computation when supported
  • LoRA adapters instead of full fine-tuning
  • Flash Attention 2 when the package is installed and the GPU supports it

In other words, this is meant to make a 7B model trainable on prosumer hardware without turning the setup into a cluster project.

Once the environment assumptions are clear, continue in this order:

  1. Dataset Preparation
  2. Tokenization & ChatML
  3. Fine-Tuning with QLoRA
  4. Testing & Inference