Introduction to K8sGPT and AI-Driven Kubernetes Engineering

Introducing K8sGPT and AI Agents

K8sGPT Features and Capabilities

Welcome to this lesson on K8sGPT, a CNCF Sandbox project that brings AI-driven diagnostics and remediation to your Kubernetes clusters. Developed by Alex Jones and his team, K8sGPT is fully open source and available as either a Kubernetes operator or a standalone binary. It connects to your live cluster, gathers resource state across Deployments, Services, ReplicaSets, and more, then leverages AI to pinpoint issues and recommend fixes.

Key Features

FeatureDescription
Cluster ScanningCrawls all namespaces and collects real-time status of Pods, Services, etc.
Issue DiagnosisIdentifies errors, misconfigurations, and failed probes across resources.
Plain-English ExplanationTranslates technical findings into clear, human-readable summaries.
Actionable AdviceProvides step-by-step remediation steps to resolve detected issues.
Data AnonymizationStrips sensitive details before sending telemetry to the AI backend.
Plugin ArchitectureSupports custom analyzers and integrations via an extensible plugin system.

The image lists five key features: Cluster Scanning, Issue Diagnosis, Plain English Explanation, Actionable Advice, and Anonymization. There is also a person speaking in the bottom right corner.

Supported AI Backends

K8sGPT integrates seamlessly with major AI providers. You can choose from:

AI ProviderTypical Use Case
OpenAIAccess GPT-4, ChatGPT APIs
Azure OpenAIEnterprise-grade LLMs hosted on Azure
Google GeminiGoogle Cloud’s advanced generative AI models
Amazon BedrockAWS-managed large language models
CohereHigh-quality embeddings and language models
LocalAIOn-prem or local model hosting for stricter compliance

The image lists supported AI backends, including OpenAI, Azure OpenAI, LocalAI, Cohere, Amazon Bedrock, and Gemini, with a person speaking in the bottom right corner.

Warning

Before running the analysis, verify your kubeconfig and context by running kubectl get nodes.
Ensure you have network access to both your cluster and the configured AI backend.

Quickstart: Cluster Analysis

After configuring your preferred AI provider, simply run:

olama@bakugo:~/demo$ k8sgpt analyse --explain
100%  AI Provider: openai

0: Deployment k8sgpt/k8sgpt-ollama
- Error: Deployment k8sgpt/k8sgpt-ollama has 1 replica but 0 are available.
  Explanation: The Deployment expects one Pod, but none are running.
  Solution:
    1. Check Pod status: kubectl get pods -n k8sgpt
    2. Describe the Pod: kubectl describe pod <pod-name> -n k8sgpt
    3. Inspect logs: kubectl logs <pod-name> -n k8sgpt
    4. Resolve errors in the Pod spec or logs.

1: Service hosting/web2
- Error: Service has no ready endpoints; expected 2 Pods but none are Ready.
  Explanation: Pods backing this Service are not passing readiness probes.
  Solution:
    1. kubectl get pods -n hosting
    2. kubectl logs <pod-name> -n hosting
    3. Verify readinessProbe in the Pod spec.
    4. Restart or fix the Pods until they become Ready.

2: Service k8sgpt/k8sgpt-ollama
- Error: No endpoints found for label app=k8sgpt-ollama.
  Explanation: The Service selector isn’t matching any Pods.
  Solution:
    1. kubectl get pods -l app=k8sgpt-ollama -n k8sgpt
    2. Label Pods if missing: kubectl label pod <pod-name> app=k8sgpt-ollama -n k8sgpt

3: ReplicaSet k8sgpt/k8sgpt-ollama-668396898
- Error: pods "k8sgpt-ollama-668396898-xxxxx" is forbidden: serviceaccount "k8sgpt" not found.
  Explanation: The ReplicaSet can’t create Pods without a valid ServiceAccount.
  Solution:
    1. Create the ServiceAccount: kubectl create serviceaccount k8sgpt -n k8sgpt
    2. Reapply or restart the Deployment/ReplicaSet.

This output demonstrates how K8sGPT scans your cluster, identifies common issues across Deployments, Services, and ReplicaSets, and provides clear, actionable remediation steps.

You will also see a live demo and hands-on lab with K8sGPT in the next module.

Watch Video

Watch video content

Previous
Traditional vs AI Assisted Kubernetes Management