Recommended ranking
Qwen3 8B Q4 for coding, Llama 3.1 8B Q4 for general chat, Gemma small variants for low power laptops, and DeepSeek coder variants when code completion is the primary task.
Ollama Hardware
For 16GB RAM, use 4-bit 7B-9B models for daily work. If you have a discrete GPU, try 14B models with shorter context; otherwise prioritize smaller fast models.
Qwen3 8B Q4 for coding, Llama 3.1 8B Q4 for general chat, Gemma small variants for low power laptops, and DeepSeek coder variants when code completion is the primary task.
Keep context moderate, close heavy apps, use GPU layers when available, and prefer Q4_K_M or similar balanced quantizations.
If you want smooth 14B-32B local models, 32GB RAM and 12GB+ VRAM make a visible difference.