Visual

script that prompt runs big screen capture

Script

Early 2025 - best practices LLMs image generation

ML Performance refers to “Quality” vs how fast it goes. Quality Throughput Latency - specifically Time to First Token.

Home vs Productoin

Compute Memory Bandwidth Memory

bs=1

Biggest model

Which Models to use?

  • llama2 7B
  • Llama3
  • Qwen

llama.cpp llama-bench

Go further ShareGPT sglang

Coding Model

Quants

TorchTune

Training

Quality; MixEval, lighteval

export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH export CPATH=$CONDA_PREFIX/include:$CPATH