背景
在k8s 部署一套 promtail + loki + grafana 日志系统。日志由 Promtail 从 Kubernetes 集群中收集并发送到 Loki。Promtail 会提取以下标签:
namespace: Pod 所在的命名空间pod_name: Pod 名称deployment_name: Deployment 名称(从 Pod 的app标签或 Pod 名称中提取)container: 容器名称
部署 loki
- Deployment: Loki 主服务
- Service: 提供集群内部访问
- ConfigMap: Loki 配置文件
- PVC: 数据持久化存储(10Gi)
- PV: 数据存储类
创建 YAML 文件
首先创建 namespace:
1kubectl create namespace logging
创建 loki 目录并创建 pv.yaml
1apiVersion: v1
2kind: PersistentVolume
3metadata:
4 name: loki-data-pv
5spec:
6 capacity:
7 storage: 10Gi # 存储大小
8 accessModes:
9 - ReadWriteOnce
10 persistentVolumeReclaimPolicy: Retain
11 storageClassName: host-loki
12 hostPath:
13 path: /data/loki # 存储位置
14 type: DirectoryOrCreate
15 nodeAffinity:
16 required:
17 nodeSelectorTerms:
18 - matchExpressions:
19 - key: kubernetes.io/hostname
20 operator: In
21 values:
22 - worker-1 # 改为你要存储的k8s节点
创建pvc.yaml
1apiVersion: v1
2kind: PersistentVolumeClaim
3metadata:
4 name: loki-data
5 namespace: logging
6spec:
7 accessModes:
8 - ReadWriteOnce
9 resources:
10 requests:
11 storage: 10Gi
12 storageClassName: host-loki
创建configmap.yaml
1apiVersion: v1
2kind: ConfigMap
3metadata:
4 name: loki-config
5 namespace: logging
6data:
7 loki.yaml: |
8 auth_enabled: false
9
10 server:
11 http_listen_port: 3100
12 grpc_listen_port: 9096
13
14 common:
15 path_prefix: /loki
16 storage:
17 filesystem:
18 chunks_directory: /loki/chunks
19 rules_directory: /loki/rules
20 replication_factor: 1
21 ring:
22 instance_addr: 127.0.0.1
23 kvstore:
24 store: inmemory
25
26 schema_config:
27 configs:
28 - from: 2020-10-24
29 store: boltdb-shipper
30 object_store: filesystem
31 schema: v11
32 index:
33 prefix: index_
34 period: 24h
35
36 ruler:
37 alertmanager_url: http://localhost:9093
38
39 analytics:
40 reporting_enabled: false
41
42 limits_config:
43 ingestion_rate_mb: 16
44 ingestion_burst_size_mb: 32
45 max_query_length: 721h
46 max_query_parallelism: 32
47 max_streams_per_user: 10000
48 max_line_size: 0
49 max_query_series: 500
50 reject_old_samples: true
51 reject_old_samples_max_age: 168h
52
53 chunk_store_config:
54 max_look_back_period: 0s
55
56 table_manager:
57 retention_deletes_enabled: true
58 retention_period: 720h
59
60 compactor:
61 working_directory: /loki/compactor
62 shared_store: filesystem
63 compaction_interval: 10m
64 retention_enabled: true
65 retention_delete_delay: 2h
66 retention_delete_worker_count: 150
创建 deployment.yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: loki
5 namespace: logging
6spec:
7 replicas: 1
8 selector:
9 matchLabels:
10 app: loki
11 template:
12 metadata:
13 labels:
14 app: loki
15 spec:
16 securityContext:
17 fsGroup: 10001
18 initContainers:
19 - name: init-storage
20 image: busybox:1.36
21 securityContext:
22 runAsUser: 0
23 command:
24 - sh
25 - -c
26 - |
27 mkdir -p /loki/chunks /loki/rules /loki/compactor
28 chown -R 10001:10001 /loki
29 chmod -R 755 /loki
30 volumeMounts:
31 - name: storage
32 mountPath: /loki
33 containers:
34 - name: loki
35 image: grafana/loki:2.9.2
36 securityContext:
37 runAsUser: 10001
38 runAsNonRoot: true
39 readOnlyRootFilesystem: false
40 ports:
41 - containerPort: 3100
42 name: http
43 - containerPort: 9096
44 name: grpc
45 args:
46 - -config.file=/etc/loki/loki.yaml
47 volumeMounts:
48 - name: config
49 mountPath: /etc/loki
50 - name: storage
51 mountPath: /loki
52 livenessProbe:
53 httpGet:
54 path: /ready
55 port: 3100
56 initialDelaySeconds: 45
57 periodSeconds: 30
58 timeoutSeconds: 5
59 failureThreshold: 3
60 readinessProbe:
61 httpGet:
62 path: /ready
63 port: 3100
64 initialDelaySeconds: 15
65 periodSeconds: 10
66 timeoutSeconds: 5
67 failureThreshold: 3
68 resources:
69 requests:
70 memory: "512Mi"
71 cpu: "250m"
72 limits:
73 memory: "2Gi"
74 cpu: "1000m"
75 volumes:
76 - name: config
77 configMap:
78 name: loki-config
79 - name: storage
80 persistentVolumeClaim:
81 claimName: loki-data
创建service.yaml
1apiVersion: v1
2kind: Service
3metadata:
4 name: loki
5 namespace: logging
6spec:
7 type: ClusterIP
8 selector:
9 app: loki
10 ports:
11 - port: 3100
12 targetPort: 3100
13 protocol: TCP
14 name: http
15 - port: 9096
16 targetPort: 9096
17 protocol: TCP
18 name: grpc
部署
- 创建 PV 和 PVC:
1kubectl apply -f pv.yaml
2kubectl apply -f pvc.yaml
- 创建 ConfigMap:
1kubectl apply -f configmap.yaml
- 部署 Deployment:
1kubectl apply -f deployment.yaml
- 创建 Service:
1kubectl apply -f service.yaml
- 验证部署状态:
1# 检查 Loki Pod 状态
2kubectl get pods -n logging -l app=loki
3
4# 查看 Loki 日志
5kubectl logs -n logging -l app=loki --tail=50
部署 promtail
Promtail 是 Grafana Loki 的日志收集代理,用于从 Kubernetes 集群中收集容器日志并发送到 Loki。
- DaemonSet: 在每个节点上运行一个 Promtail Pod,收集节点上的容器日志
- ConfigMap: 包含 Promtail 配置,定义日志收集规则和 Loki 输出
- ServiceAccount & RBAC: 提供访问 Kubernetes API 的权限,用于获取 Pod 元数据
创建 YAML 文件
创建 promtail 目录并创建 configmap.yaml
1apiVersion: v1
2kind: ConfigMap
3metadata:
4 name: promtail-config
5 namespace: logging
6data:
7 promtail.yaml: |
8 server:
9 http_listen_port: 3101
10 grpc_listen_port: 9096
11
12 positions:
13 filename: /tmp/positions.yaml
14
15 clients:
16 - url: http://loki.logging.svc.cluster.local:3100/loki/api/v1/push
17
18 scrape_configs:
19 - job_name: kubernetes-pods
20 kubernetes_sd_configs:
21 - role: pod
22 pipeline_stages:
23 - docker: {}
24 relabel_configs:
25 # 设置日志文件路径
26 - source_labels:
27 - __meta_kubernetes_namespace
28 - __meta_kubernetes_pod_name
29 - __meta_kubernetes_pod_uid
30 separator: _
31 target_label: __tmp_pod_path
32 - source_labels:
33 - __tmp_pod_path
34 - __meta_kubernetes_pod_container_name
35 separator: /
36 target_label: __path__
37 replacement: /var/log/pods/$1/$2/*.log
38 # 提取 namespace 标签
39 - source_labels:
40 - __meta_kubernetes_namespace
41 target_label: namespace
42 # 提取 pod_name 标签
43 - source_labels:
44 - __meta_kubernetes_pod_name
45 target_label: pod_name
46 # 提取 deployment_name 标签(优先从 app 标签获取)
47 - source_labels:
48 - __meta_kubernetes_pod_label_app
49 target_label: deployment_name
50 regex: (.+)
51 # 如果 app 标签不存在,从 pod 名称中提取(格式:deployment-name-replicaset-hash)
52 - source_labels:
53 - __meta_kubernetes_pod_name
54 target_label: deployment_name
55 regex: '^(.+?)-[0-9a-z]+-[0-9a-z]+$'
56 replacement: '${1}'
57 action: replace
58 # 提取 container 标签
59 - source_labels:
60 - __meta_kubernetes_pod_container_name
61 target_label: container
62 # 只保留有效的 pod 日志
63 - action: keep
64 source_labels:
65 - __meta_kubernetes_pod_name
66 - __meta_kubernetes_pod_node_name
67 - __meta_kubernetes_namespace
68 # 移除所有 __meta_kubernetes 前缀的标签
69 - action: labeldrop
70 regex: '__meta_kubernetes.*'
创建 daemonset.yaml
1apiVersion: apps/v1
2kind: DaemonSet
3metadata:
4 name: promtail
5 namespace: logging
6spec:
7 selector:
8 matchLabels:
9 app: promtail
10 template:
11 metadata:
12 labels:
13 app: promtail
14 spec:
15 serviceAccountName: promtail
16 tolerations:
17 - effect: NoSchedule
18 operator: Exists
19 - effect: NoExecute
20 operator: Exists
21 containers:
22 - name: promtail
23 image: grafana/promtail:2.9.2
24 args:
25 - -config.file=/etc/promtail/promtail.yaml
26 ports:
27 - name: http-metrics
28 containerPort: 3101
29 env:
30 - name: HOSTNAME
31 valueFrom:
32 fieldRef:
33 fieldPath: spec.nodeName
34 volumeMounts:
35 - name: config
36 mountPath: /etc/promtail
37 - name: varlog
38 mountPath: /var/log
39 readOnly: true
40 - name: varlibdockercontainers
41 mountPath: /var/lib/docker/containers
42 readOnly: true
43 - name: positions
44 mountPath: /tmp
45 resources:
46 requests:
47 memory: "128Mi"
48 cpu: "100m"
49 limits:
50 memory: "256Mi"
51 cpu: "200m"
52 securityContext:
53 runAsUser: 0
54 runAsGroup: 0
55 runAsNonRoot: false
56 volumes:
57 - name: config
58 configMap:
59 name: promtail-config
60 - name: varlog
61 hostPath:
62 path: /var/log
63 - name: varlibdockercontainers
64 hostPath:
65 path: /var/lib/docker/containers
66 - name: positions
67 emptyDir: {}
创建 serviceaccount.yaml
1apiVersion: v1
2kind: ServiceAccount
3metadata:
4 name: promtail
5 namespace: logging
6
7---
8apiVersion: rbac.authorization.k8s.io/v1
9kind: ClusterRole
10metadata:
11 name: promtail
12rules:
13 - apiGroups: [""]
14 resources:
15 - nodes
16 - nodes/proxy
17 - services
18 - endpoints
19 - pods
20 verbs: ["get", "list", "watch"]
21 - apiGroups:
22 - ""
23 resources:
24 - configmaps
25 verbs: ["get"]
26
27---
28apiVersion: rbac.authorization.k8s.io/v1
29kind: ClusterRoleBinding
30metadata:
31 name: promtail
32roleRef:
33 apiGroup: rbac.authorization.k8s.io
34 kind: ClusterRole
35 name: promtail
36subjects:
37 - kind: ServiceAccount
38 name: promtail
39 namespace: logging
部署
- 创建 ConfigMap:
1kubectl apply -f configmap.yaml
- 创建 ServiceAccount 和 RBAC:
1kubectl apply -f serviceaccount.yaml
- 部署 DaemonSet:
1kubectl apply -f daemonset.yaml
- 验证部署状态:
1# 检查 Promtail Pod 是否在所有节点上运行
2kubectl get pods -n logging -l app=promtail
3
4# 查看 Promtail 日志
5kubectl logs -n logging -l app=promtail --tail=50
部署 grafana
Grafana 用于可视化 Loki 中的日志数据,提供强大的查询和仪表板功能。
- Deployment: Grafana 主服务
- Service: 提供集群内部访问,你也可以 nodeport 直接访问
- Ingress: 提供外部访问(通过 Traefik)
- ConfigMap: 数据源配置(自动配置 Loki 数据源)
- PVC: 数据持久化存储(10Gi,保存仪表板和配置)
创建 YAML 文件
创建 grafana 目录并创建 configmap.yaml
1apiVersion: v1
2kind: ConfigMap
3metadata:
4 name: grafana-datasources
5 namespace: logging
6data:
7 datasources.yaml: |
8 apiVersion: 1
9 datasources:
10 - name: Loki
11 type: loki
12 access: proxy
13 url: http://loki.logging.svc.cluster.local:3100
14 isDefault: false
15 editable: true
16 jsonData:
17 maxLines: 1000
18 derivedFields:
19 - datasourceUid: loki
20 matcherRegex: "traceID=(\\w+)"
21 name: TraceID
22 url: '$${__value.raw}'
创建 deployment.yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: grafana
5 namespace: logging
6spec:
7 replicas: 1
8 selector:
9 matchLabels:
10 app: grafana
11 template:
12 metadata:
13 labels:
14 app: grafana
15 spec:
16 initContainers:
17 - name: init-storage
18 image: busybox:1.36
19 securityContext:
20 runAsUser: 0
21 command:
22 - sh
23 - -c
24 - |
25 mkdir -p /var/lib/grafana/plugins /var/lib/grafana/data /var/lib/grafana/logs
26 chown -R root:root /var/lib/grafana
27 chmod -R 755 /var/lib/grafana
28 volumeMounts:
29 - name: storage
30 mountPath: /var/lib/grafana
31 containers:
32 - name: grafana
33 image: grafana/grafana:10.2.2
34 securityContext:
35 runAsUser: 0
36 runAsNonRoot: false
37 readOnlyRootFilesystem: false
38 ports:
39 - containerPort: 3000
40 name: http
41 env:
42 - name: GF_SECURITY_ADMIN_USER
43 value: admin
44 - name: GF_SECURITY_ADMIN_PASSWORD
45 value: admin # 建议通过 Secret 管理
46 - name: GF_INSTALL_PLUGINS
47 value: ""
48 - name: GF_SERVER_ROOT_URL
49 value: "%(protocol)s://%(domain)s:%(http_port)s/"
50 - name: GF_SERVER_SERVE_FROM_SUB_PATH
51 value: "false"
52 volumeMounts:
53 - name: storage
54 mountPath: /var/lib/grafana
55 - name: datasources
56 mountPath: /etc/grafana/provisioning/datasources
57 livenessProbe:
58 httpGet:
59 path: /api/health
60 port: 3000
61 initialDelaySeconds: 60
62 periodSeconds: 30
63 timeoutSeconds: 5
64 failureThreshold: 3
65 readinessProbe:
66 httpGet:
67 path: /api/health
68 port: 3000
69 initialDelaySeconds: 30
70 periodSeconds: 10
71 timeoutSeconds: 5
72 failureThreshold: 3
73 resources:
74 requests:
75 memory: "256Mi"
76 cpu: "100m"
77 limits:
78 memory: "512Mi"
79 cpu: "500m"
80 volumes:
81 - name: storage
82 persistentVolumeClaim:
83 claimName: grafana-data
84 - name: datasources
85 configMap:
86 name: grafana-datasources
创建 pv.yaml
1apiVersion: v1
2kind: PersistentVolume
3metadata:
4 name: grafana-data-pv
5spec:
6 capacity:
7 storage: 10Gi # 存储大小
8 accessModes:
9 - ReadWriteOnce
10 persistentVolumeReclaimPolicy: Retain
11 storageClassName: host-grafana
12 hostPath:
13 path: /data/grafana # 存储目录
14 type: DirectoryOrCreate
15 nodeAffinity:
16 required:
17 nodeSelectorTerms:
18 - matchExpressions:
19 - key: kubernetes.io/hostname
20 operator: In
21 values:
22 - worker-1 # 修改成你自己的节点
创建 pvc.yaml
1apiVersion: v1
2kind: PersistentVolumeClaim
3metadata:
4 name: grafana-data
5 namespace: logging
6spec:
7 accessModes:
8 - ReadWriteOnce
9 resources:
10 requests:
11 storage: 10Gi
12 storageClassName: host-grafana
创建 service.yaml
1apiVersion: v1
2kind: Service
3metadata:
4 name: grafana
5 namespace: logging
6spec:
7 type: ClusterIP
8 selector:
9 app: grafana
10 ports:
11 - port: 3000
12 targetPort: 3000
13 protocol: TCP
14 name: http
创建 ingress.yaml
1apiVersion: networking.k8s.io/v1
2kind: Ingress
3metadata:
4 name: grafana-ingress
5 namespace: logging
6 annotations:
7 traefik.ingress.kubernetes.io/router.entrypoints: web,websecure
8 traefik.ingress.kubernetes.io/router.tls: "true"
9spec:
10 tls:
11 - hosts:
12 - grafana-dev.jobcher.com #改成你自己的域名
13 secretName: jobcher-com-tls #改成你自己的SSL证书
14 rules:
15 - host: grafana-dev.jobcher.com #改成你自己的域名
16 http:
17 paths:
18 - path: /
19 pathType: Prefix
20 backend:
21 service:
22 name: grafana
23 port:
24 number: 3000
部署
- 创建 PV和PVC:
1kubectl apply -f pv.yaml
2kubectl apply -f pvc.yaml
- 创建 ConfigMap(数据源配置):
1kubectl apply -f configmap.yaml
- 部署 Deployment:
1kubectl apply -f deployment.yaml
- 创建 Service:
1kubectl apply -f service.yaml
- 创建 Ingress(可选,用于外部访问):
1kubectl apply -f ingress.yaml
- 验证部署状态:
1# 检查所有组件状态
2kubectl get pods -n logging
3
4# 检查 Loki 服务
5kubectl get svc -n logging
6
7# 查看 Loki 日志
8kubectl logs -n logging -l app=loki --tail=50
9
10# 查看 Grafana 日志
11kubectl logs -n logging -l app=grafana --tail=50
验证和测试
验证 Loki 是否正常工作
1# 检查 Loki 健康状态
2kubectl exec -n logging -it deployment/loki -- wget -q -O - http://localhost:3100/ready
3
4# 查询 Loki 中的日志流
5kubectl exec -n logging -it deployment/loki -- wget -q -O - "http://localhost:3100/loki/api/v1/label/namespace/values"
在 Grafana 中查看日志
- 访问 Grafana(通过 Ingress 或端口转发):
1# 端口转发(如果使用 ClusterIP)
2kubectl port-forward -n logging svc/grafana 3000:3000
- 使用默认账号登录:
admin/admin - 进入 Explore 页面,选择 Loki 数据源
- 使用 LogQL 查询日志,例如:
{namespace="default"}- 查看 default 命名空间的日志{pod_name="your-pod-name"}- 查看特定 Pod 的日志{deployment_name="your-deployment"}- 查看特定 Deployment 的日志
注意事项
存储配置
- 确保节点上的
/var/log/pods路径存在(Kubernetes 1.14+ 标准日志路径) - 根据实际需求调整 PV 的存储大小和节点选择
- 如果使用 containerd 而不是 Docker,日志路径格式相同,但可能需要在配置中调整
安全配置
- Promtail 需要以 root 用户运行(uid 0)才能访问节点上的日志文件
- Grafana 管理员密码建议通过 Secret 管理,而不是直接写在 Deployment 中
- 生产环境建议启用 Loki 的认证功能
性能优化
- 根据集群规模调整资源限制(CPU、内存)
- 根据日志量调整 Loki 的
ingestion_rate_mb和ingestion_burst_size_mb - 定期清理旧日志,根据
retention_period配置自动删除
标签提取逻辑
deployment_name标签的提取逻辑:- 优先使用 Pod 的
app标签值 - 如果
app标签不存在,则从 Pod 名称中提取(格式:deployment-name-replicaset-hash)
- 优先使用 Pod 的
- 确保 Pod 有正确的标签,以便 Promtail 正确提取元数据
故障排查
- 如果 Promtail 无法收集日志,检查 Pod 是否有权限访问
/var/log/pods - 如果 Loki 无法接收日志,检查网络连接和 Service 配置
- 使用
kubectl logs查看各组件的日志进行排查












