Readiness probe failed: http probe failed with statuscode: 404

Problem

This technote describes an issue that you may face with ICP logging.
You have an ICP cluster with 3 management node on which you have installed icp-logging with the version : 2.3.0

You encounter anomalies with logstash, after few minutes one of your tree pods getting likeness error and continually restarts

Symptom

Events:
Type Reason Age From Message

Warning Unhealthy 13s (x536 over 23h) kubelet, ... Liveness probe failed: HTTP probe failed with statuscode: 504

Diagnosing The Problem

When you delete and recreate helm on cluster it's working fine again for a few minutes.
Checking the logs, you will see logstash container was OOM killed (`Exit Code: 137`) on 2 out of the 3 management nodes including the reported one. And it keeps restarting there - `Restart Count: 328`

Containers:
  logstash:
    ...
    ...
    State:          Running
      Started:      Tue, 30 Jun 2020 04:03:39 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Tue, 30 Jun 2020 03:55:39 +0000
      Finished:     Tue, 30 Jun 2020 04:03:39 +0000
    Ready:          True
    Restart Count:  328
    Limits:
      memory:  1Gi
    Requests:
      memory:  1Gi

Resolving The Problem

To resolve this issue you will need to edit the logstash deployment to:

Increase its memory limit
Edit its LS_JAVA_OPTS Env value to give JVM more memory

Document Location

Worldwide

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSBS6K","label":"IBM Cloud Private"},"ARM Category":[{"code":"a8m0z0000001jMjAAI","label":"CommonServices->Logging"}],"ARM Case Number":"TS003878011","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB45","label":"Automation"}}]

Product Synonym

ICP, IBM Cloud Private, Cloud Private, Common services, ICP Logging

This page shows how to configure liveness, readiness and startup probes for containers.

The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.

The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

The kubelet uses startup probes to know when a container application has started. If such a probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don't interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running.

Before you begin

You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:

Killercoda
Play with Kubernetes

Define a liveness command

Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. Kubernetes provides liveness probes to detect and remedy such situations.

In this exercise, you create a Pod that runs a container based on the registry.k8s.io/busybox image. Here is the configuration file for the Pod:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  containers:
  - name: liveness
    image: registry.k8s.io/busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5

In the configuration file, you can see that the Pod has a single Container. The periodSeconds field specifies that the kubelet should perform a liveness probe every 5 seconds. The initialDelaySeconds field tells the kubelet that it should wait 5 seconds before performing the first probe. To perform a probe, the kubelet executes the command cat /tmp/healthy in the target container. If the command succeeds, it returns 0, and the kubelet considers the container to be alive and healthy. If the command returns a non-zero value, the kubelet kills the container and restarts it.

When the container starts, it executes this command:

/bin/sh -c "touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600"

For the first 30 seconds of the container's life, there is a /tmp/healthy file. So during the first 30 seconds, the command cat /tmp/healthy returns a success code. After 30 seconds, cat /tmp/healthy returns a failure code.

Create the Pod:

kubectl apply -f https://k8s.io/examples/pods/probe/exec-liveness.yaml

Within 30 seconds, view the Pod events:

kubectl describe pod liveness-exec

The output indicates that no liveness probes have failed yet:

Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  11s   default-scheduler  Successfully assigned default/liveness-exec to node01
  Normal  Pulling    9s    kubelet, node01    Pulling image "registry.k8s.io/busybox"
  Normal  Pulled     7s    kubelet, node01    Successfully pulled image "registry.k8s.io/busybox"
  Normal  Created    7s    kubelet, node01    Created container liveness
  Normal  Started    7s    kubelet, node01    Started container liveness

After 35 seconds, view the Pod events again:

kubectl describe pod liveness-exec

At the bottom of the output, there are messages indicating that the liveness probes have failed, and the failed containers have been killed and recreated.

  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  57s                default-scheduler  Successfully assigned default/liveness-exec to node01
  Normal   Pulling    55s                kubelet, node01    Pulling image "registry.k8s.io/busybox"
  Normal   Pulled     53s                kubelet, node01    Successfully pulled image "registry.k8s.io/busybox"
  Normal   Created    53s                kubelet, node01    Created container liveness
  Normal   Started    53s                kubelet, node01    Started container liveness
  Warning  Unhealthy  10s (x3 over 20s)  kubelet, node01    Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
  Normal   Killing    10s                kubelet, node01    Container liveness failed liveness probe, will be restarted

Wait another 30 seconds, and verify that the container has been restarted:

kubectl get pod liveness-exec

The output shows that RESTARTS has been incremented. Note that the RESTARTS counter increments as soon as a failed container comes back to the running state:

NAME            READY     STATUS    RESTARTS   AGE
liveness-exec   1/1       Running   1          1m

Define a liveness HTTP request

Another kind of liveness probe uses an HTTP GET request. Here is the configuration file for a Pod that runs a container based on the registry.k8s.io/liveness image.

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: registry.k8s.io/liveness
    args:
    - /server
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
        httpHeaders:
        - name: Custom-Header
          value: Awesome
      initialDelaySeconds: 3
      periodSeconds: 3

In the configuration file, you can see that the Pod has a single container. The periodSeconds field specifies that the kubelet should perform a liveness probe every 3 seconds. The initialDelaySeconds field tells the kubelet that it should wait 3 seconds before performing the first probe. To perform a probe, the kubelet sends an HTTP GET request to the server that is running in the container and listening on port 8080. If the handler for the server's /healthz path returns a success code, the kubelet considers the container to be alive and healthy. If the handler returns a failure code, the kubelet kills the container and restarts it.

Any code greater than or equal to 200 and less than 400 indicates success. Any other code indicates failure.

You can see the source code for the server in server.go.

For the first 10 seconds that the container is alive, the /healthz handler returns a status of 200. After that, the handler returns a status of 500.

http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
    duration := time.Now().Sub(started)
    if duration.Seconds() > 10 {
        w.WriteHeader(500)
        w.Write([]byte(fmt.Sprintf("error: %v", duration.Seconds())))
    } else {
        w.WriteHeader(200)
        w.Write([]byte("ok"))
    }
})

The kubelet starts performing health checks 3 seconds after the container starts. So the first couple of health checks will succeed. But after 10 seconds, the health checks will fail, and the kubelet will kill and restart the container.

To try the HTTP liveness check, create a Pod:

kubectl apply -f https://k8s.io/examples/pods/probe/http-liveness.yaml

After 10 seconds, view Pod events to verify that liveness probes have failed and the container has been restarted:

kubectl describe pod liveness-http

In releases prior to v1.13 (including v1.13), if the environment variable http_proxy (or HTTP_PROXY) is set on the node where a Pod is running, the HTTP liveness probe uses that proxy. In releases after v1.13, local HTTP proxy environment variable settings do not affect the HTTP liveness probe.

Define a TCP liveness probe

A third type of liveness probe uses a TCP socket. With this configuration, the kubelet will attempt to open a socket to your container on the specified port. If it can establish a connection, the container is considered healthy, if it can't it is considered a failure.

apiVersion: v1
kind: Pod
metadata:
  name: goproxy
  labels:
    app: goproxy
spec:
  containers:
  - name: goproxy
    image: registry.k8s.io/goproxy:0.1
    ports:
    - containerPort: 8080
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20

As you can see, configuration for a TCP check is quite similar to an HTTP check. This example uses both readiness and liveness probes. The kubelet will send the first readiness probe 5 seconds after the container starts. This will attempt to connect to the goproxy container on port 8080. If the probe succeeds, the Pod will be marked as ready. The kubelet will continue to run this check every 10 seconds.

In addition to the readiness probe, this configuration includes a liveness probe. The kubelet will run the first liveness probe 15 seconds after the container starts. Similar to the readiness probe, this will attempt to connect to the goproxy container on port 8080. If the liveness probe fails, the container will be restarted.

To try the TCP liveness check, create a Pod:

kubectl apply -f https://k8s.io/examples/pods/probe/tcp-liveness-readiness.yaml

After 15 seconds, view Pod events to verify that liveness probes:

kubectl describe pod goproxy

Define a gRPC liveness probe

FEATURE STATE: Kubernetes v1.24 [beta]

If your application implements gRPC Health Checking Protocol, kubelet can be configured to use it for application liveness checks. You must enable the GRPCContainerProbe feature gate in order to configure checks that rely on gRPC.

Here is an example manifest:

apiVersion: v1
kind: Pod
metadata:
  name: etcd-with-grpc
spec:
  containers:
  - name: etcd
    image: registry.k8s.io/etcd:3.5.1-0
    command: [ "/usr/local/bin/etcd", "--data-dir",  "/var/lib/etcd", "--listen-client-urls", "http://0.0.0.0:2379", "--advertise-client-urls", "http://127.0.0.1:2379", "--log-level", "debug"]
    ports:
    - containerPort: 2379
    livenessProbe:
      grpc:
        port: 2379
      initialDelaySeconds: 10

To use a gRPC probe, port must be configured. If the health endpoint is configured on a non-default service, you must also specify the service.

Configuration problems (for example: incorrect port and service, unimplemented health checking protocol) are considered a probe failure, similar to HTTP and TCP probes.

To try the gRPC liveness check, create a Pod using the command below. In the example below, the etcd pod is configured to use gRPC liveness probe.

kubectl apply -f https://k8s.io/examples/pods/probe/grpc-liveness.yaml

After 15 seconds, view Pod events to verify that the liveness check has not failed:

kubectl describe pod etcd-with-grpc

Before Kubernetes 1.23, gRPC health probes were often implemented using grpc-health-probe, as described in the blog post Health checking gRPC servers on Kubernetes. The built-in gRPC probes behavior is similar to one implemented by grpc-health-probe. When migrating from grpc-health-probe to built-in probes, remember the following differences:

Built-in probes run against the pod IP address, unlike grpc-health-probe that often runs against 127.0.0.1. Be sure to configure your gRPC endpoint to listen on the Pod's IP address.
Built-in probes do not support any authentication parameters (like -tls).
There are no error codes for built-in probes. All errors are considered as probe failures.
If ExecProbeTimeout feature gate is set to false, grpc-health-probe does not respect the timeoutSeconds setting (which defaults to 1s), while built-in probe would fail on timeout.

Use a named port

You can use a named port for HTTP and TCP probes. (gRPC probes do not support named ports).

For example:

ports:
- name: liveness-port
  containerPort: 8080
  hostPort: 8080

livenessProbe:
  httpGet:
    path: /healthz
    port: liveness-port

Protect slow starting containers with startup probes

Sometimes, you have to deal with legacy applications that might require an additional startup time on their first initialization. In such cases, it can be tricky to set up liveness probe parameters without compromising the fast response to deadlocks that motivated such a probe. The trick is to set up a startup probe with the same command, HTTP or TCP check, with a failureThreshold * periodSeconds long enough to cover the worse case startup time.

So, the previous example would become:

ports:
- name: liveness-port
  containerPort: 8080
  hostPort: 8080

livenessProbe:
  httpGet:
    path: /healthz
    port: liveness-port
  failureThreshold: 1
  periodSeconds: 10

startupProbe:
  httpGet:
    path: /healthz
    port: liveness-port
  failureThreshold: 30
  periodSeconds: 10

Thanks to the startup probe, the application will have a maximum of 5 minutes (30 * 10 = 300s) to finish its startup. Once the startup probe has succeeded once, the liveness probe takes over to provide a fast response to container deadlocks. If the startup probe never succeeds, the container is killed after 300s and subject to the pod's restartPolicy.

Define readiness probes

Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup, or depend on external services after startup. In such cases, you don't want to kill the application, but you don't want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.

Readiness probes are configured similarly to liveness probes. The only difference is that you use the readinessProbe field instead of the livenessProbe field.

readinessProbe:
  exec:
    command:
    - cat
    - /tmp/healthy
  initialDelaySeconds: 5
  periodSeconds: 5

Configuration for HTTP and TCP readiness probes also remains identical to liveness probes.

Readiness and liveness probes can be used in parallel for the same container. Using both can ensure that traffic does not reach a container that is not ready for it, and that containers are restarted when they fail.

Configure Probes

Probes have a number of fields that you can use to more precisely control the behavior of liveness and readiness checks:

initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 0 seconds. Minimum value is 0.
periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness and startup Probes. Minimum value is 1.
failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the container. In case of readiness probe the Pod will be marked Unready. Defaults to 3. Minimum value is 1.

HTTP probes

HTTP probes have additional fields that can be set on httpGet:

host: Host name to connect to, defaults to the pod IP. You probably want to set "Host" in httpHeaders instead.
scheme: Scheme to use for connecting to the host (HTTP or HTTPS). Defaults to HTTP.
path: Path to access on the HTTP server. Defaults to /.
httpHeaders: Custom headers to set in the request. HTTP allows repeated headers.
port: Name or number of the port to access on the container. Number must be in the range 1 to 65535.

For an HTTP probe, the kubelet sends an HTTP request to the specified path and port to perform the check. The kubelet sends the probe to the pod's IP address, unless the address is overridden by the optional host field in httpGet. If scheme field is set to HTTPS, the kubelet sends an HTTPS request skipping the certificate verification. In most scenarios, you do not want to set the host field. Here's one scenario where you would set it. Suppose the container listens on 127.0.0.1 and the Pod's hostNetwork field is true. Then host, under httpGet, should be set to 127.0.0.1. If your pod relies on virtual hosts, which is probably the more common case, you should not use host, but rather set the Host header in httpHeaders.

For an HTTP probe, the kubelet sends two request headers in addition to the mandatory Host header: User-Agent, and Accept. The default values for these headers are kube-probe/1.25 (where 1.25 is the version of the kubelet ), and */* respectively.

You can override the default headers by defining .httpHeaders for the probe; for example

livenessProbe:
  httpGet:
    httpHeaders:
      - name: Accept
        value: application/json

startupProbe:
  httpGet:
    httpHeaders:
      - name: User-Agent
        value: MyUserAgent

You can also remove these two headers by defining them with an empty value.

livenessProbe:
  httpGet:
    httpHeaders:
      - name: Accept
        value: ""

startupProbe:
  httpGet:
    httpHeaders:
      - name: User-Agent
        value: ""

TCP probes

For a TCP probe, the kubelet makes the probe connection at the node, not in the pod, which means that you can not use a service name in the host parameter since the kubelet is unable to resolve it.

Probe-level terminationGracePeriodSeconds

FEATURE STATE: Kubernetes v1.25 [beta]

Prior to release 1.21, the pod-level terminationGracePeriodSeconds was used for terminating a container that failed its liveness or startup probe. This coupling was unintended and may have resulted in failed containers taking an unusually long time to restart when a pod-level terminationGracePeriodSeconds was set.

In 1.25 and beyond, users can specify a probe-level terminationGracePeriodSeconds as part of the probe specification. When both a pod- and probe-level terminationGracePeriodSeconds are set, the kubelet will use the probe-level value.

For example,

spec:
  terminationGracePeriodSeconds: 3600  # pod-level
  containers:
  - name: test
    image: ...

    ports:
    - name: liveness-port
      containerPort: 8080
      hostPort: 8080

    livenessProbe:
      httpGet:
        path: /healthz
        port: liveness-port
      failureThreshold: 1
      periodSeconds: 60
      # Override pod-level terminationGracePeriodSeconds #
      terminationGracePeriodSeconds: 60

Probe-level terminationGracePeriodSeconds cannot be set for readiness probes. It will be rejected by the API server.

What's next

Learn more about Container Probes.

You can also read the API references for:

Pod, and specifically:
- container(s)
- probe(s)

How do I fix the readiness probe failed Kubernetes?

To increase the readiness probe failure threshold, configure the Managed controller item and update the value of "Readiness Failure Threshold". By default, it is set to 100 (100 times). You may increase it to, for example, 300 .

What happens if readiness probe failed?

failureThreshold : When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the container. In case of readiness probe the Pod will be marked Unready.

What is readiness probe?

A readiness probe indicates whether applications running in a container are ready to receive traffic. If so, Services in Kubernetes can send traffic to the pod, and if not, the endpoint controller removes the pod from all services.

Why do liveness probes fail?

The liveness probe will be marked as failed when the container issues an unhealthy response. The probe is also considered failed if the service doesn't implement the gRPC health checking protocol. Monitor the health of your cluster and troubleshoot issues faster with pre-built dashboards that just work.

readiness probe failed http probe failed statuscode

Readiness probe failed: http probe failed with statuscode: 404

Problem

Symptom

Diagnosing The Problem

Resolving The Problem

Document Location

Product Synonym

Before you begin

Define a liveness command

Define a liveness HTTP request

Define a TCP liveness probe

Define a gRPC liveness probe

Use a named port

Protect slow starting containers with startup probes

Define readiness probes

Configure Probes

HTTP probes

TCP probes

Probe-level terminationGracePeriodSeconds

What's next

How do I fix the readiness probe failed Kubernetes?

What happens if readiness probe failed?

What is readiness probe?

Why do liveness probes fail?

Related Posts

Advertising

LATEST NEWS

Advertising

Populer

Advertising

About

Legal

Help

Social