Homelab
Three Linux nodes connected over a private Tailscale mesh. Each service runs as a Docker container with its own Tailscale IP—no exposed ports. Public apps route through Cloudflare Tunnels. Database is a MongoDB replica set with automatic failover.
Local Nvidia Orchestrator-8b model runs with two transformer-based neural network models(llama-nemotron rerank && embed 1b V3) -- 54 output Tokens/sec. Uses intent classification for certain tool calling, acts as sysadmin exposed to ansible playbook tools to run updates, health checks, backups across all nodes. Still building full centralization of data, then exposing this model to a main LLM through a cursor, claude code session to act as the main model assistant, context manager, long-term memory.
Currently 25 devices on Tailscale network: 3 nodes, 2 laptops, 20 services, 2 of which not running.
Below: hardware specs, database setup, monitoring/logging stack, local SLM with RAG, Ansible automation, Docker architecture, security layers, and running applications.
Tech Stack
- Network: Tailscale mesh VPN, UFW firewall, Cloudflare Tunnels
- Containers: Docker, Docker Compose
- Orchestration: Ansible playbooks for deployment and health checks
- Database: MongoDB 8.0 replica set (primary, secondary, arbiter)
- Vector DB: Qdrant for RAG embeddings
- Monitoring: Prometheus, Grafana (7 dashboards), Node Exporter, cAdvisor, DCGM
- Logging: Loki, Promtail (centralized log aggregation)
- SLM: Nemotron-8B via llama.cpp on RTX 3060 (43 tokens/sec)
- ML: Sentence Transformers, Cross-Encoder reranking
- Security: Fail2Ban, HashiCorp Vault, SSH hardening
- Notifications: Slack webhooks, Cloudflare Email Workers
- Web Frameworks: Next.js, FastAPI, Node.js, nginx
- Mobile: React Native, Expo, iOS App Store, Google Play Store
Nodes
Primary Node (Prometheus)
- CPU: AMD Ryzen 9 5950X (16 cores / 32 threads)
- RAM: 32GB DDR4
- GPU 0: NVIDIA RTX 5060 ti (16GB VRAM, 448GB/s bandwidth)
- GPU 1: NVIDIA RTX 3060 (12GB VRAM, 360GB/s bandwidth)
- Storage: 500GB NVMe SSD
- OS: Ubuntu 24.04.3
- Network: Ethernet — 668.91 Mbps down / 728.07 Mbps up
- Location: Southern Florida
Beelink (Mini PC)
- CPU: Intel N100 (4 cores / 4 threads)
- RAM: 16GB DDR4
- Storage: 500GB SSD
- OS: Ubuntu 24.04.3
- Network: Ethernet — 92.1 Mbps down / 17.6 Mbps up
- Location: Argentina
Raspberry Pi 5
- CPU: ARM Cortex-A76 (4 cores / 4 threads)
- RAM: 8GB
- Storage: 1TB Samsung T7 SSD (USB 3.2 Gen 2, 1,050MB/s)
- OS: Debian GNU/Linux 12 (Bookworm)
- Boot: External SSD, no SD card
- Network: Ethernet — 105.75 Mbps down / 19.91 Mbps up
- Location: Argentina
Database
MongoDB replica set across all three nodes. Primary node handles writes, Beelink is secondary with full data copy, Pi runs as arbiter for voting only.
- Replica Set: rs0
- Version: MongoDB 8.0.16
- Failover: Automatic. If primary goes down, Beelink promotes within seconds.
- Auth: keyFile authentication, 3-tier users (admin, app, monitor)
- Latency: 0.35ms average replication across Tailscale mesh
- Backups: Automated mongodump to Pi, compressed .gz archives, 7-day retention
Performance tuning I did: kernel swappiness set to 1, Transparent Huge Pages enabled with defer+madvise, tcmalloc-google allocator. All MongoDB startup warnings eliminated.
Monitoring
Prometheus scrapes metrics every 15s from all nodes. 30-day retention. Grafana runs on the Pi for dashboards.
Grafana Data Sources:
- Prometheus (metrics)
- Loki (logs)
What I'm monitoring:
- Node Exporter on all nodes (CPU, RAM, disk, network)
- cAdvisor v0.51.0 on all nodes (container metrics)
- MongoDB Exporter on all nodes (replica set health, connections, oplog)
- NVIDIA DCGM Exporter (GPU temp, utilization, power draw, memory)
- Promtail shipping logs from all nodes to Loki
- Qdrant Vector DB metrics(Still need to build out more)
- Nemotron SLM server (inference latency, tokens/sec)
cAdvisor optimization: Default config was eating CPU. Changed housekeeping interval to 10s, disabled collectors I don't need (tcp, udp, sched, process, hugetlb). 93-95% CPU reduction.
Logging
Centralized logging with Loki and Promtail. Loki runs on primary node, Promtail agents on all nodes ship logs.
What gets collected:
- Docker container logs (auto-discovered)
- Systemd journal logs
- Fail2Ban events (security)
- Tailscale connectivity logs
- Qdrant and DCGM logs
Retention: 30 days with automatic compaction every 10 minutes. 100MB query cache.
All logs get labeled by host, node type, container name, compose service. Makes filtering in Grafana easy.
AI/ML
Local SLM running on the RTX 3060. No API costs, full privacy, all inference on-prem.
Nemotron Orchestrator-8B:
- Quantized to Q6_K_L (GGUF format)
- Inference via llama.cpp, 4096 token context
- ~43 tokens/sec average generation speed
- FastAPI server with SSE streaming
- Prometheus metrics for latency tracking
RAG Pipeline: Documents chunked → embedded with Nemotron-embed-1b (768 dims) → stored in Qdrant. At query time: semantic search → rerank with Nemotron-rerank-1b (cross-encoder) → SLM generates with context.
Intent Routing: Queries get classified and routed: infrastructure questions enable sysadmin tools, RAG queries pull from vector store, general knowledge goes direct to SLM.
Tool Use: SLM executes infrastructure tools—system diagnostics, file operations, RAG search. I can ask it questions about my own setup and it figures out what to run.
Ansible Automation
All deployment and health checks run through Ansible. SSH key auth, no passwords.
Playbooks I use: | Playbook | What it does | |----------|--------------| | playbook-monitoring.yml | Deploy Prometheus, exporters, Grafana | | playbook-logging.yml | Deploy Loki and Promtail agents | | 01-playbook-network.yml | Ping tests, speedtest across nodes | | 02-playbook-health.yml | CPU, RAM, disk checks | | 03-playbook-db-replica-set.yml | Replica set status, replication lag | | 04-playbook-backups.yml | mongodump with compression and retention |
Quick commands I run often:
ansible all -m ping
ansible all -m shell -a "df -h /"
ansible all -m shell -a "docker ps"
Docker Architecture
Every service runs in its own network namespace via Tailscale sidecar pattern. No shared Docker networks, no exposed ports on host.
Sidecar Pattern:
┌─────────────────────────────────────┐
│ Tailscale Container (ts-fastapi) │ ← Gets its own Tailscale IP
│ network_mode: bridge │
└─────────────────────────────────────┘
▲
│ shares network namespace
▼
┌─────────────────────────────────────┐
│ App Container (fastapi) │ ← Uses sidecar's network
│ network_mode: service:ts-fastapi │
└─────────────────────────────────────┘
Each app appears as its own device on Tailscale. Jugamos alone has 3 Tailscale nodes: nextjs, fastapi, nginx.
Container Security:
- Non-root users in all Dockerfiles (
adduser,USER fastapi) - Multi-stage builds—no build tools in production images
- Alpine base images for minimal attack surface
- No privileged mode (except cAdvisor for metrics)
- Secrets via env vars from
.env, never baked into images
Build Example (FastAPI):
FROM python:3.12-alpine AS production
RUN addgroup -g 1001 fastapi \
&& adduser -u 1001 -G fastapi -s /bin/sh -D fastapi
USER fastapi
Running Applications
Four web apps deployed, each with dedicated Tailscale sidecars.
YelpCamp - Camp review site. Node.js, Cloudinary for images, Mapbox for maps, MongoDB backend.
Portfolio - This site. Next.js with Google Analytics. Exposed via Cloudflare Tunnel.
Jugamos - Game platform. Next.js frontend, FastAPI backend, nginx reverse proxy. JWT auth, avatar uploads. Three separate Tailscale nodes for isolation.
Cleanfuture.io - Renewable energy comparison platform. Next.js frontend, FastAPI backend. Helps users compare solar systems and research energy implementation.
Security
Layered security across all nodes. Two main tools: Fail2Ban for intrusion detection, HashiCorp Vault for secrets management.
Fail2Ban:
- Running on all 3 nodes, monitors SSH
- 3 failed attempts = 1 hour IP ban
- Ban events shipped to Loki, viewable in Grafana
HashiCorp Vault:
- Centralized secrets management
- API tokens, database credentials, Tailscale keys
- Nothing hardcoded in repos
SSH Hardening:
- Key-only auth (ED25519), no passwords
- Root login disabled
- AllowUsers whitelist
- X11 forwarding off
- Idle timeout kicks inactive sessions
Firewall:
- UFW default deny on all nodes
- Only Tailscale subnet (100.64.0.0/10) allowed
- Zero public ports exposed
Database:
- keyFile auth between replica members
- Separate users for admin, app, monitoring
- Credentials pulled from Vault, set as env vars
Containers:
- Loki runs non-root with no-new-privileges
- Tailscale sidecars isolate each service
- Only cAdvisor runs privileged (required for metrics)
Notifications
Slack: Webhook integration for Grafana alerts, backup status, Fail2Ban triggers.
Email: Routed through Cloudflare Email Workers. No local SMTP server to maintain.
What's Next
- Keep building out infrastructure and ansible playbooks for Nvidia's Orchestrator 8b to run, more grafana alerting with model in the loop to handle issues immediately.
- Add one or two more beelink sized nodes at different locations.
- Start implementing cloud nodes, run the nginx proxy service and load balance from that node when traffic increases across apps.
- Get 8-12gb more vram locally to run larger models, offload smaller models for RAG retrieval onto the new GPU.
- Improve and keep building more apps to get more traffic. I build what is required, and try to omit technical implementations when not warranted.