Note : Even if this part is not required, you should not ignore it on dev environment and should really really REALLY not skip it for production. In fact, it can contain useful debug informations and security traces to see what is going on in your kubernetes cluster, and even on your whole server(s).
This tutorial will guide you to setup audit log policy, catch logs with Fluentd, cast them to elasticsearch & show them using Kibana.
First, choose an audit log dir name on the host {{audit.sourceLogDir}}
. This is the directory where kubernetes will write its audit logs, and should be in /var/log
. Then, choose an audit log file {{audit.sourceLogFile}}
in {{audit.sourceLogDir}}
. The final audit logs path is then {{audit.sourceLogDir}}/{{audit.sourceLogFile}}
FluentD will parse those audit logs, and split them by tags for easier sorting of logs. It will then write those zones in {{audit.destLogDir}}
In order to pipe audit log messages to Elasticsearch, we need to install fluentd on the kubernetes master host.
Install fluentd (on the kubernetes master host)
Install Chrony
Start by installing Chrony for accurate timestamps
1
2
|
dnf install chrony
systemctl enable --now chronyd
|
You should be good to go.
Check the file descriptors limit for the root user (use sudo):
1
2
3
4
|
ulimit -n
# » 1024
ulimit -Hn
# » 262144
|
If it is low (like 1024), you need to increase it, by opening your system’s limits. So, open your limits.conf
file:
1
|
vim /etc/security/limits.conf
|
Set the following configurations:
1
2
3
4
|
root soft nofile 65536
root hard nofile 65536
* soft nofile 65536
* hard nofile 65536
|
Then reboot & recheck for the root user (use sudo).
1
2
|
ulimit -n # should be 65536
ulimit -Hn # should be at least 65536
|
If you run this as your normal user, ulimit -n
changes might not be changed.
If the environment is expected to have a high load, follow
this section of the guide
Install FluentD & plugins
Add the td-agent
repository & install it
1
2
|
curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent3.sh | sh
systemctl enable --now td-agent.service
|
Check if it works by posting a sample log
1
2
|
curl -X POST -d 'json={"json":"message"}' http://localhost:8888/debug.test
cat /var/log/td-agent/td-agent.log # should end with our test message above
|
Install required plugins with the following command:
1
|
td-agent-gem install fluent-plugin-forest fluent-plugin-rewrite-tag-filter
|
If having errors here, see the
Troubleshoot section at the end.
Install the td-agent/kube.conf template template into /etc/td-agent/
, include it in your master configuration, and create the log dirs.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
# From https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#log-collector-examples
# fluentd conf runs in the same host with kube-apiserver
<source>
@type tail
# audit log path of kube-apiserver
path {{audit.sourceLogDir}}/{{audit.sourceLogFile}}
pos_file {{audit.sourceLogDir}}/{{audit.sourceLogFile}}.pos
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%N%z
tag audit
</source>
<filter audit>
#https://github.com/fluent/fluent-plugin-rewrite-tag-filter/issues/13
@type record_transformer
enable_ruby
<record>
namespace ${record["objectRef"].nil? ? "none":(record["objectRef"]["namespace"].nil? ? "none":record["objectRef"]["namespace"])}
</record>
</filter>
<match audit>
# route audit according to namespace element in context
@type rewrite_tag_filter
<rule>
key namespace
pattern /^(.+)/
tag ${tag}.$1
</rule>
</match>
<filter audit.**>
@type record_transformer
remove_keys namespace
</filter>
<match audit.**>
@type forest
subtype file
remove_prefix audit
<template>
time_slice_format %Y%m%d%H
compress gz
path {{audit.destLogDir}}/audit-${tag}.*.log
format json
include_time_key true
</template>
</match>
|
1
2
3
4
5
6
7
8
9
10
|
mv ./td-agent/kube.conf /etc/td-agent/td-agent.conf
# Include kubernetes configuration it in configuration
echo "@include './kube.conf'" >> /etc/td-agent/td-agent.conf
# Create the log dir that will be mounted into the API server
mkdir -p {{audit.destLogDir}}
# If required, allow td-agent to read/write in it
chown -R root:td-agent {{audit.destLogDir}}
chmod -R g+w {{audit.destLogDir}}
# Restart the agent
systemctl restart td-agent.service
|
Setup the audit log
See the
example audit log policy & the template audit log file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
|
# From https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/audit/audit-policy.yaml
# See https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#audit-policy for more info
apiVersion: audit.k8s.io/v1 # This is required.
kind: Policy
# Don't generate audit events for all requests in RequestReceived stage.
omitStages:
- "RequestReceived"
rules:
# Log pod changes at RequestResponse level
- level: RequestResponse
resources:
- group: ""
# Resource "pods" doesn't match requests to any subresource of pods,
# which is consistent with the RBAC policy.
resources: ["pods"]
# Log "pods/log", "pods/status" at Metadata level
- level: Metadata
resources:
- group: ""
resources: ["pods/log", "pods/status"]
# Don't log requests to a configmap called "controller-leader"
- level: None
resources:
- group: ""
resources: ["configmaps"]
resourceNames: ["controller-leader"]
# Don't log watch requests by the "system:kube-proxy" on endpoints or services
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core API group
resources: ["endpoints", "services"]
# Don't log authenticated requests to certain non-resource URL paths.
- level: None
userGroups: ["system:authenticated"]
nonResourceURLs:
- "/api*" # Wildcard matching.
- "/version"
# Log the request body of configmap changes in kube-system.
- level: Request
resources:
- group: "" # core API group
resources: ["configmaps"]
# This rule only applies to resources in the "kube-system" namespace.
# The empty string "" can be used to select non-namespaced resources.
namespaces: ["kube-system"]
# Log configmap and secret changes in all other namespaces at the Metadata level.
- level: Metadata
resources:
- group: "" # core API group
resources: ["secrets", "configmaps"]
# Log all other resources in core and extensions at the Request level.
- level: Request
resources:
- group: "" # core API group
- group: "extensions" # Version of group should NOT be included.
# A catch-all rule to log all other requests at the Metadata level.
- level: Metadata
# Long-running requests like watches that fall under this rule will not
# generate an audit event in RequestReceived.
omitStages:
- "RequestReceived"
|
Move it in the /etc/kubernetes
folder (because this is a kubernete’s configuration).
1
2
|
mv ./kubernetes/audit-log-policy.yaml /etc/kubernetes/audit-log-policy.yaml
chown root:root /etc/kubernetes/audit-log-policy.yaml
|
Troubleshoot
Unable to download data from https://rubygems.org/ - timed out (https://api.rubygems.org/specs.4.8.gz)
Rubygems repository seems to have issues with IPv6. Check with below commands:
1
2
|
curl -v --head https://api.rubygems.org
curl -6 -v --head https://api.rubygems.org
|
If the 1st command worked and the second hang (timeout), then you are having troubles with IPv6, and you need to temporarly disable it.
1
2
|
sysctl -w net.ipv6.conf.default.disable_ipv6=1
sysctl -w net.ipv6.conf.all.disable_ipv6=1
|
After installing your plugin, re-enable IPv6
1
2
|
sysctl -w net.ipv6.conf.default.disable_ipv6=0
sysctl -w net.ipv6.conf.all.disable_ipv6=0
|