K3s

来自linuxsa wiki
跳转到导航 跳转到搜索

ins

usage

sudo bash 01-install-k3s.sh      # 装 k3s,自带 Traefik + kubectl 配好
bash 02-install-kuboard.sh       # 可选,起个 Kuboard 面板管理集群
bash 03-test-deploy.sh           # 部署测试应用,自动验证 Deployment -> Service -> Ingress 链路

几点说明:

01 装完会自动配好 ~/.kube/config,不用手动折腾权限
02 用 Docker 单独跑 Kuboard(官方推荐方式,不占集群资源),装完导入 kubeconfig 就能接管
03 部署了一个 2 副本的 nginx demo,通过 Ingress 暴露,脚本最后会给你现成的 curl 命令验证请求链路和负载均衡效果——这跟你在公司验证 ALB → Target Group → Pod 链路的逻辑是对应的,只是入口换成了本地 Traefik

在 laptop 上跑之前记得确认下 Docker 装好了(装 Kuboard 那步需要);k3s 本身不依赖 Docker,内置了 containerd。




## 现状分析

- 根分区 `/` 已经 92% 满,只剩 6.1G
- k3s 本身占用不大(二进制 + containerd 大概 1G 出头),但它的运行数据、镜像、日志默认全部写在 `/var/lib/rancher/k3s`,也就是根分区下
- 加上你还要装 Kuboard(Docker 镜像 + 数据),Docker 的镜像默认也放在 `/var/lib/docker`,同样在根分区
- 6.1G 短期内能跑起来,但只要拉几个测试镜像、日志堆积一下,很容易把根分区写满——一旦根分区满了,系统会出各种奇怪问题(不只是 k3s 出问题,整个系统都会卡)

## 你的好消息

`data` 这个分区有 **65G 可用**,完全够用。所以思路很简单:**把 k3s 和 Docker 的数据目录都指到这个分区上**,而不是默认的根分区。

## 具体改法

**k3s**:安装时加 `--data-dir` 参数,指到你的大分区

```bash
curl -sfL https://get.k3s.io | sh -s - \
  --write-kubeconfig-mode 644 \
  --data-dir /home/evan/data/k3s
```

**Docker**(给 Kuboard 用):改 `/etc/docker/daemon.json`,加 `data-root`

```bash
sudo mkdir -p /home/evan/data/docker
sudo tee /etc/docker/daemon.json <<EOF
{
  "data-root": "/home/evan/data/docker"
}
EOF
sudo systemctl restart docker
```

要我把 `01-install-k3s.sh` 改一版,加上 `--data-dir` 参数(可以做成脚本开头的变量,方便你以后调整路径),再补一个 Docker 迁移的小脚本吗?

00 migrate docker data

done Jul 02

#!/usr/bin/env bash
# ==========================================================
# 迁移 Docker 数据目录到空间充足的分区
# 用法: sudo bash 00-migrate-docker-data.sh
# 适用于: 已安装 Docker,或即将安装 Docker (给 Kuboard 用)
# ==========================================================
set -e

DOCKER_DATA_DIR="/home/evan/data/docker"

if ! command -v docker &> /dev/null; then
    echo "未检测到 Docker,直接配置好数据目录,等安装 Docker 时会自动使用。"
    mkdir -p "$DOCKER_DATA_DIR"
    sudo mkdir -p /etc/docker
    echo "{\"data-root\": \"$DOCKER_DATA_DIR\"}" | sudo tee /etc/docker/daemon.json
    echo "✅ 配置完成,之后安装 Docker 会自动使用: $DOCKER_DATA_DIR"
    exit 0
fi

echo ">>> 检测到已安装 Docker,开始迁移数据目录"
echo ">>> 当前数据目录使用情况:"
docker info 2>/dev/null | grep "Docker Root Dir" || true
du -sh /var/lib/docker 2>/dev/null || true

echo ">>> 1. 停止 Docker 服务"
sudo systemctl stop docker
sudo systemctl stop docker.socket 2>/dev/null || true

echo ">>> 2. 创建新数据目录并迁移现有数据"
mkdir -p "$DOCKER_DATA_DIR"
sudo rsync -aP /var/lib/docker/ "$DOCKER_DATA_DIR/"

echo ">>> 3. 配置 Docker 使用新数据目录"
sudo mkdir -p /etc/docker
echo "{\"data-root\": \"$DOCKER_DATA_DIR\"}" | sudo tee /etc/docker/daemon.json

echo ">>> 4. 重启 Docker"
sudo systemctl start docker

echo ">>> 5. 验证"
docker info 2>/dev/null | grep "Docker Root Dir"

echo ""
echo "=========================================="
echo "✅ 迁移完成,新数据目录: $DOCKER_DATA_DIR"
echo "=========================================="
echo ""
echo "确认容器/镜像都还在 (docker ps -a / docker images) 之后,"
echo "可以手动删除旧数据释放根分区空间:"
echo "  sudo rm -rf /var/lib/docker.bak  # 如果你手动备份过"
echo "  (本脚本用的是 rsync 拷贝,原 /var/lib/docker 还在,确认无误后可执行:)"
echo "  sudo rm -rf /var/lib/docker"

01 install k3s


#!/usr/bin/env bash
# ==========================================================
# k3s 单节点安装脚本 (适用于 Kali Linux / Debian 系 x86_64)
# 用法: sudo bash 01-install-k3s.sh
# ==========================================================
set -e

echo ">>> 1. 检查是否已安装 k3s"
if command -v k3s &> /dev/null; then
    echo "检测到已安装 k3s,版本信息:"
    k3s --version
    read -p "是否要重新安装? (y/N): " confirm
    if [[ "$confirm" != "y" && "$confirm" != "Y" ]]; then
        echo "已取消安装。"
        exit 0
    fi
    echo ">>> 卸载旧版本..."
    /usr/local/bin/k3s-uninstall.sh || true
fi

echo ">>> 2. 安装 k3s (默认自带 Traefik Ingress Controller、containerd、local-path-provisioner)"
# --write-kubeconfig-mode 644 让当前用户可以直接读 kubeconfig,方便家用机器上省事
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644

echo ">>> 3. 等待节点就绪..."
sleep 5
until sudo k3s kubectl get node | grep -q "Ready"; do
    echo "等待节点启动中..."
    sleep 3
done

echo ">>> 4. 配置 kubectl 访问 (给当前用户)"
mkdir -p "$HOME/.kube"
sudo cp /etc/rancher/k3s/k3s.yaml "$HOME/.kube/config"
sudo chown "$(id -u)":"$(id -g)" "$HOME/.kube/config"
chmod 600 "$HOME/.kube/config"

# 如果系统没有单独的 kubectl 二进制,给 k3s 内置的 kubectl 建个软链接,方便直接敲 kubectl 命令
if ! command -v kubectl &> /dev/null; then
    echo ">>> 未检测到独立 kubectl,创建软链接指向 k3s 内置 kubectl"
    sudo ln -sf /usr/local/bin/k3s /usr/local/bin/kubectl
fi

echo ""
echo "=========================================="
echo "✅ k3s 安装完成!"
echo "=========================================="
echo ""
echo "节点状态:"
kubectl get nodes -o wide
echo ""
echo "系统 Pod 状态 (应该能看到 traefik, coredns, local-path-provisioner):"
kubectl get pods -A
echo ""
echo "下一步: 运行 02-install-kuboard.sh 部署管理面板"
echo "或直接运行 03-test-deploy.sh 部署一个测试应用验证网络链路"

02 install k8s dashboard


#!/usr/bin/env bash
# ==========================================================
# Kubernetes Dashboard 安装脚本 (官方原生方案,替代 Kuboard)
# 用法: bash 02-install-k8s-dashboard.sh
# ==========================================================
set -e

DASHBOARD_VERSION="v2.7.0"

echo ">>> 1. 部署 Kubernetes Dashboard"
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/${DASHBOARD_VERSION}/aio/deploy/recommended.yaml

echo ">>> 2. 等待 Dashboard Pod 就绪..."
kubectl wait --for=condition=ready pod -l k8s-app=kubernetes-dashboard \
  -n kubernetes-dashboard --timeout=90s

echo ">>> 3. 创建管理员 ServiceAccount (仅用于本地学习,生产环境不要给 cluster-admin)"
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kubernetes-dashboard
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: admin-user
    namespace: kubernetes-dashboard
EOF

echo ">>> 4. 生成登录 Token"
TOKEN=$(kubectl -n kubernetes-dashboard create token admin-user --duration=8760h)

echo ">>> 5. 通过 kubectl proxy 暴露访问 (家用最简单的方式,不用配 Ingress/证书)"
echo ""
echo "=========================================="
echo "✅ Dashboard 安装完成"
echo "=========================================="
echo ""
echo "第一步: 在另一个终端里启动代理 (保持前台运行):"
echo "  kubectl proxy"
echo ""
echo "第二步: 浏览器打开:"
echo "  http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/"
echo ""
echo "第三步: 用下面这个 Token 登录 (已保存到 dashboard-token.txt,有效期 1 年):"
echo "$TOKEN" > "$(dirname "$0")/dashboard-token.txt"
echo "  Token 已保存到: $(dirname "$0")/dashboard-token.txt"
echo ""
echo "如果想直接从局域网内其他设备访问 (不只是本机),把 kubectl proxy 换成:"
echo "  kubectl proxy --address='0.0.0.0' --accept-hosts='^.*\$'"
echo "  (仅限家庭内网学习使用,不要暴露到公网)"

02 install kuboard

#!/usr/bin/env bash
# ==========================================================
# Kuboard 部署脚本 (以 Docker 方式独立运行,不依赖集群内部署)
# 这是 Kuboard 官方推荐的最简单方式:用 Docker 跑 Kuboard 本体,
# 通过挂载/配置 kubeconfig 来接管你的 k3s 集群
# 用法: bash 02-install-kuboard.sh
# ==========================================================
set -e

if ! command -v docker &> /dev/null; then
    echo "❌ 未检测到 Docker,请先安装 Docker:"
    echo "   curl -fsSL https://get.docker.com | sh"
    exit 1
fi

echo ">>> 启动 Kuboard 容器..."
docker run -d \
  --restart=unless-stopped \
  --name=kuboard \
  -p 80:80/tcp \
  -p 10081:10081/tcp \
  -e KUBOARD_ENDPOINT="http://$(hostname -I | awk '{print $1}'):80" \
  -e KUBOARD_AGENT_SERVER_TCP_PORT="10081" \
  -v ~/kuboard-data:/data \
  eipwork/kuboard:v3

echo ""
echo "=========================================="
echo "✅ Kuboard 已启动"
echo "=========================================="
echo ""
echo "浏览器打开: http://$(hostname -I | awk '{print $1}')"
echo "默认账号: admin  默认密码: Kuboard123"
echo ""
echo "登录后,在 Kuboard 界面里 '导入集群' -> 选择 '通过 kubeconfig 导入'"
echo "把 ~/.kube/config 的内容粘贴进去即可接管你的 k3s 集群"

03 test deploy

#!/usr/bin/env bash
# ==========================================================
# 部署测试应用并验证 Deployment -> Service -> Ingress 链路
# 用法: bash 03-test-deploy.sh
# ==========================================================
set -e

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

echo ">>> 1. 部署测试应用 (2个副本的 nginx demo + Service + Ingress)"
kubectl apply -f "$SCRIPT_DIR/test-app.yaml"

echo ">>> 2. 等待 Pod 就绪..."
kubectl wait --for=condition=ready pod -l app=hello-web --timeout=60s

echo ">>> 3. 查看资源状态"
echo "--- Pods ---"
kubectl get pods -l app=hello-web -o wide
echo "--- Service ---"
kubectl get svc hello-web-svc
echo "--- Ingress ---"
kubectl get ingress hello-web-ingress

NODE_IP=$(hostname -I | awk '{print $1}')
echo ""
echo "=========================================="
echo "✅ 部署完成,验证请求链路:"
echo "=========================================="
echo ""
echo "方式一 (推荐,直接用 curl 带 Host 头,不用改 /etc/hosts):"
echo "  curl -H 'Host: hello.local.test' http://${NODE_IP}"
echo ""
echo "方式二 (改 /etc/hosts 后浏览器访问):"
echo "  echo '${NODE_IP} hello.local.test' | sudo tee -a /etc/hosts"
echo "  然后浏览器打开 http://hello.local.test"
echo ""
echo "多请求几次,观察返回的 hostname 是否在两个 Pod 之间切换 (验证负载均衡):"
echo "  for i in {1..5}; do curl -s -H 'Host: hello.local.test' http://${NODE_IP} | grep -i 'Server name'; done"
echo ""
echo "清理测试应用: kubectl delete -f test-app.yaml"

test app

# ==========================================================
# 最小化测试应用: Deployment -> Service -> Ingress
# 用来验证 k3s 自带 Traefik 的请求链路是否打通
# 部署: kubectl apply -f test-app.yaml
# 删除: kubectl delete -f test-app.yaml
# ==========================================================
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-web
  labels:
    app: hello-web
spec:
  replicas: 2
  selector:
    matchLabels:
      app: hello-web
  template:
    metadata:
      labels:
        app: hello-web
    spec:
      containers:
        - name: hello-web
          image: nginxdemos/hello:latest   # 会返回容器 hostname,方便看负载均衡效果
          ports:
            - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: hello-web-svc
spec:
  selector:
    app: hello-web
  ports:
    - port: 80
      targetPort: 80
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hello-web-ingress
  annotations:
    # k3s 自带 traefik,ingressClassName 通常是 traefik
    kubernetes.io/ingress.class: traefik
spec:
  rules:
    - host: hello.local.test   # 换成你自己想用的域名/或直接用节点 IP + Host 头测试
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: hello-web-svc
                port:
                  number: 80

troubleshooting

E0702 23:21:05.844610   68952 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://cls-5qz05rv8.ccs.tencent-cloud.com/api?timeout=32s\": dial tcp: lookup cls-5qz05rv8.ccs.tencent-cloud.com on 192.168.10.1:53: no such host"
Unable to connect to the server: dial tcp: lookup cls-5qz05rv8.ccs.tencent-cloud.com on 192.168.10.1:53: no such host


sudo cat /root/.kube/config 2>&1 | head -20
如果这里显示 server: https://cls-5qz05rv8.ccs.tencent-cloud.com,那就实锤了——root 的 kubeconfig 是脏的。
修复
用之前同样的方式,给 root 也重新生成一份干净的:
bashsudo mkdir -p /root/.kube
sudo cp /etc/rancher/k3s/k3s.yaml /root/.kube/config
sudo chmod 600 /root/.kube/config
验证:
bashsudo kubectl get nodes