Monitoring – CloudWatch

Monitoring – CloudWatch (Comprehensive Notes)

Monitoring – CloudWatch 

Introduction to Monitoring in AWS and the Role of CloudWatch

Monitoring is one of the foundational pillars of cloud infrastructure. In AWS, monitoring ensures visibility, performance optimization, fault detection, troubleshooting efficiency, and operational excellence. Amazon CloudWatch is the primary monitoring and observability service in AWS that collects, tracks, analyzes, and visualizes operational data. It is designed to help developers, DevOps engineers, cloud administrators, and SRE teams maintain application health with ease.

CloudWatch automatically collects metrics from AWS services such as EC2, RDS, DynamoDB, Lambda, API Gateway, ECS, EKS, and even on-premises servers. It supports logs, metrics, alarms, dashboards, synthetic monitoring, distributed tracing, event automation, anomaly detection, and much more.

Key Features of Amazon CloudWatch

  • CloudWatch Metrics
  • CloudWatch Logs
  • CloudWatch Alarms
  • CloudWatch Dashboards
  • CloudWatch Events / EventBridge
  • CloudWatch Contributor Insights
  • CloudWatch Application Insights
  • CloudWatch Synthetics
  • CloudWatch ServiceLens
  • CloudWatch Anomaly Detection
  • CloudWatch Log Insights
  • CloudWatch Embedded Metrics
  • CloudWatch Metric Streams

CloudWatch Metrics

CloudWatch Metrics provide time-series data used to track system and application performance. AWS automatically creates metrics for supported services. You can also publish custom metrics.

Types of Metrics

1. AWS Provided (Default) Metrics

  • EC2 metrics: CPUUtilization, NetworkIn/Out, DiskReadOps
  • Lambda metrics: Invocations, Duration, Errors, Throttles
  • RDS metrics: CPUUtilization, FreeStorageSpace
  • DynamoDB metrics: ReadThrottleEvents, ConsumedReadCapacityUnits
  • API Gateway metrics: Latency, 4XXErrors, 5XXErrors

2. Custom Metrics

Custom metrics are user-defined metrics created by applications or agents.


aws cloudwatch put-metric-data \
  --metric-name PageLoadTime \
  --namespace WebsiteMetrics \
  --value 350

Metric Granularity

  • Standard resolution: 1-minute intervals
  • High resolution: 1-second intervals (higher cost)

Namespaces and Dimensions

A namespace groups metrics, and dimensions identify the unique characteristics of a metric. Example: Namespace = AWS/EC2, Dimension = InstanceId=i-123456789

CloudWatch Logs

CloudWatch Logs collect, store, and analyze logs from AWS services, applications, and on-prem servers. CloudWatch Logs are commonly used for auditing, debugging, trend analysis, and security monitoring.

Log Sources

  • Lambda execution logs
  • VPC Flow Logs
  • CloudTrail events
  • API Gateway access logs
  • ECS/EKS container logs
  • EC2 logs via CloudWatch Agent

Log Groups and Streams

Log Group: A collection of log streams with the same retention and settings. Log Stream: A sequence of log events from a single source.

CloudWatch Log Insights

A powerful query engine to query logs in real-time.


fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc

CloudWatch Alarms

CloudWatch Alarms allow automated actions based on metric thresholds. They notify users when thresholds are breached or automate workflows such as scaling actions.

Alarm States

  • OK
  • ALARM
  • INSUFFICIENT_DATA

Alarm Actions

  • Send SNS notifications
  • Trigger Auto Scaling
  • Stop or terminate EC2
  • Recover an EC2 instance
  • Trigger Lambda functions

aws cloudwatch put-metric-alarm \
  --alarm-name HighCPU \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=InstanceId,Value=i-123456789 \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:sns:us-east-1:111122223333:NotifyAdmin

CloudWatch Dashboards

Dashboards visualize metrics and logs on a single unified view. They support cross-account and cross-region resources.

Dashboard Widgets

  • Line graphs
  • Bar/stacked graphs
  • Number widgets
  • Text widgets
  • Alarm widgets

{
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "metrics": [
          [ "AWS/EC2", "CPUUtilization", "InstanceId", "i-123456789" ]
        ],
        "period": 300,
        "stat": "Average",
        "region": "us-east-1"
      }
    }
  ]
}

CloudWatch Events / EventBridge

EventBridge (formerly CloudWatch Events) provides event-driven automation. It responds to state changes across AWS services.

Common Event Sources

  • EC2 state changes
  • Auto Scaling events
  • S3 object creation
  • Cron-based scheduled jobs
  • Security Hub events

{
  "source": ["aws.ec2"],
  "detail-type": ["EC2 Instance State-change Notification"],
  "detail": {
    "state": ["stopped"]
  }
}

CloudWatch Anomaly Detection

Anomaly Detection applies machine learning to detect unusual patterns. It removes the need for manual threshold setting and reduces false alerts.

CloudWatch Synthetics

Synthetics uses canariesβ€”lightweight scriptsβ€”to test application endpoints and APIs. It monitors user journeys, API health, latency, and availability.

Example Canary Script


// Node.js Canary Example
const synthetics = require('Synthetics');

exports.handler = async () => {
  const response = await synthetics.get("https://example.com");
  console.log("Status Code:", response.status);
};

CloudWatch ServiceLens

ServiceLens integrates CloudWatch with AWS X-Ray to provide distributed tracing across microservices. It helps identify bottlenecks and end-to-end request flows.

CloudWatch Contributor Insights

Contributor Insights analyzes logs and identifies top contributors to performance issues, failures, or traffic spikes.

CloudWatch Logs Retention, Export, and Archival

Retention Options

  • 1 day to 10 years
  • Indefinite retention

Export Options

  • Export to Amazon S3
  • Stream to OpenSearch
  • Stream to 3rd-party log providers

CloudWatch Agent

The CloudWatch Agent collects OS-level logs, metrics, and custom data from EC2 or on-prem servers.

Agent Configuration Example


{
  "metrics": {
    "append_dimensions": { "InstanceId": "${aws:InstanceId}" },
    "metrics_collected": {
      "cpu": { "measurement": ["usage_system", "usage_user"] },
      "mem": { "measurement": ["mem_used_percent"] }
    }
  },
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [
          {
            "file_path": "/var/log/messages",
            "log_group_name": "system-logs"
          }
        ]
      }
    }
  }
}

CloudWatch Pricing Considerations

CloudWatch pricing varies depending on log ingestion, metrics, alarms, dashboards, canaries, and queries. Proper retention and metric selection strategies significantly lower costs.

Cost Optimization Tips

  • Use log retention policies
  • Export long-term logs to S3
  • Avoid unnecessary high-resolution metrics
  • Centralize logs using subscription filters

CloudWatch in DevOps and SRE

CloudWatch integrates deeply with DevOps tooling for monitoring CI/CD pipelines, deployments, scaling, and issue detection. SRE teams rely on CloudWatch for SLIs, SLO tracking, error budgeting, and incident response.

Best Practices for Using CloudWatch

  • Use structured JSON logging
  • Enable detailed monitoring when needed
  • Use Infrastructure-as-Code for alarm management
  • Enable cross-account monitoring
  • Use anomaly detection for dynamic workloads
  • Optimize log retention to reduce cost

Common Interview Questions on CloudWatch

  1. Difference between CloudWatch Logs and CloudTrail?
  2. How does CloudWatch Agent work?
  3. What is anomaly detection?
  4. How do Metric Streams work?
  5. What is the use of CloudWatch Dashboards?

Amazon CloudWatch is a comprehensive monitoring and observability platform for AWS, hybrid, and on-premise environments. It provides metrics, logs, alarms, dashboards, traces, canaries, anomaly detection, and automated actions. With its wide feature set, CloudWatch helps ensure high availability, performance efficiency, cost optimization, and system reliability for cloud-native architectures.

logo

AWS

Beginner 5 Hours
Monitoring – CloudWatch (Comprehensive Notes)

Monitoring – CloudWatch 

Introduction to Monitoring in AWS and the Role of CloudWatch

Monitoring is one of the foundational pillars of cloud infrastructure. In AWS, monitoring ensures visibility, performance optimization, fault detection, troubleshooting efficiency, and operational excellence. Amazon CloudWatch is the primary monitoring and observability service in AWS that collects, tracks, analyzes, and visualizes operational data. It is designed to help developers, DevOps engineers, cloud administrators, and SRE teams maintain application health with ease.

CloudWatch automatically collects metrics from AWS services such as EC2, RDS, DynamoDB, Lambda, API Gateway, ECS, EKS, and even on-premises servers. It supports logs, metrics, alarms, dashboards, synthetic monitoring, distributed tracing, event automation, anomaly detection, and much more.

Key Features of Amazon CloudWatch

  • CloudWatch Metrics
  • CloudWatch Logs
  • CloudWatch Alarms
  • CloudWatch Dashboards
  • CloudWatch Events / EventBridge
  • CloudWatch Contributor Insights
  • CloudWatch Application Insights
  • CloudWatch Synthetics
  • CloudWatch ServiceLens
  • CloudWatch Anomaly Detection
  • CloudWatch Log Insights
  • CloudWatch Embedded Metrics
  • CloudWatch Metric Streams

CloudWatch Metrics

CloudWatch Metrics provide time-series data used to track system and application performance. AWS automatically creates metrics for supported services. You can also publish custom metrics.

Types of Metrics

1. AWS Provided (Default) Metrics

  • EC2 metrics: CPUUtilization, NetworkIn/Out, DiskReadOps
  • Lambda metrics: Invocations, Duration, Errors, Throttles
  • RDS metrics: CPUUtilization, FreeStorageSpace
  • DynamoDB metrics: ReadThrottleEvents, ConsumedReadCapacityUnits
  • API Gateway metrics: Latency, 4XXErrors, 5XXErrors

2. Custom Metrics

Custom metrics are user-defined metrics created by applications or agents.

aws cloudwatch put-metric-data \ --metric-name PageLoadTime \ --namespace WebsiteMetrics \ --value 350

Metric Granularity

  • Standard resolution: 1-minute intervals
  • High resolution: 1-second intervals (higher cost)

Namespaces and Dimensions

A namespace groups metrics, and dimensions identify the unique characteristics of a metric. Example: Namespace = AWS/EC2, Dimension = InstanceId=i-123456789

CloudWatch Logs

CloudWatch Logs collect, store, and analyze logs from AWS services, applications, and on-prem servers. CloudWatch Logs are commonly used for auditing, debugging, trend analysis, and security monitoring.

Log Sources

  • Lambda execution logs
  • VPC Flow Logs
  • CloudTrail events
  • API Gateway access logs
  • ECS/EKS container logs
  • EC2 logs via CloudWatch Agent

Log Groups and Streams

Log Group: A collection of log streams with the same retention and settings. Log Stream: A sequence of log events from a single source.

CloudWatch Log Insights

A powerful query engine to query logs in real-time.

fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc

CloudWatch Alarms

CloudWatch Alarms allow automated actions based on metric thresholds. They notify users when thresholds are breached or automate workflows such as scaling actions.

Alarm States

  • OK
  • ALARM
  • INSUFFICIENT_DATA

Alarm Actions

  • Send SNS notifications
  • Trigger Auto Scaling
  • Stop or terminate EC2
  • Recover an EC2 instance
  • Trigger Lambda functions
aws cloudwatch put-metric-alarm \ --alarm-name HighCPU \ --metric-name CPUUtilization \ --namespace AWS/EC2 \ --statistic Average \ --period 300 \ --threshold 80 \ --comparison-operator GreaterThanThreshold \ --dimensions Name=InstanceId,Value=i-123456789 \ --evaluation-periods 2 \ --alarm-actions arn:aws:sns:us-east-1:111122223333:NotifyAdmin

CloudWatch Dashboards

Dashboards visualize metrics and logs on a single unified view. They support cross-account and cross-region resources.

Dashboard Widgets

  • Line graphs
  • Bar/stacked graphs
  • Number widgets
  • Text widgets
  • Alarm widgets
{ "widgets": [ { "type": "metric", "properties": { "metrics": [ [ "AWS/EC2", "CPUUtilization", "InstanceId", "i-123456789" ] ], "period": 300, "stat": "Average", "region": "us-east-1" } } ] }

CloudWatch Events / EventBridge

EventBridge (formerly CloudWatch Events) provides event-driven automation. It responds to state changes across AWS services.

Common Event Sources

  • EC2 state changes
  • Auto Scaling events
  • S3 object creation
  • Cron-based scheduled jobs
  • Security Hub events
{ "source": ["aws.ec2"], "detail-type": ["EC2 Instance State-change Notification"], "detail": { "state": ["stopped"] } }

CloudWatch Anomaly Detection

Anomaly Detection applies machine learning to detect unusual patterns. It removes the need for manual threshold setting and reduces false alerts.

CloudWatch Synthetics

Synthetics uses canaries—lightweight scripts—to test application endpoints and APIs. It monitors user journeys, API health, latency, and availability.

Example Canary Script

// Node.js Canary Example const synthetics = require('Synthetics'); exports.handler = async () => { const response = await synthetics.get("https://example.com"); console.log("Status Code:", response.status); };

CloudWatch ServiceLens

ServiceLens integrates CloudWatch with AWS X-Ray to provide distributed tracing across microservices. It helps identify bottlenecks and end-to-end request flows.

CloudWatch Contributor Insights

Contributor Insights analyzes logs and identifies top contributors to performance issues, failures, or traffic spikes.

CloudWatch Logs Retention, Export, and Archival

Retention Options

  • 1 day to 10 years
  • Indefinite retention

Export Options

  • Export to Amazon S3
  • Stream to OpenSearch
  • Stream to 3rd-party log providers

CloudWatch Agent

The CloudWatch Agent collects OS-level logs, metrics, and custom data from EC2 or on-prem servers.

Agent Configuration Example

{ "metrics": { "append_dimensions": { "InstanceId": "${aws:InstanceId}" }, "metrics_collected": { "cpu": { "measurement": ["usage_system", "usage_user"] }, "mem": { "measurement": ["mem_used_percent"] } } }, "logs": { "logs_collected": { "files": { "collect_list": [ { "file_path": "/var/log/messages", "log_group_name": "system-logs" } ] } } } }

CloudWatch Pricing Considerations

CloudWatch pricing varies depending on log ingestion, metrics, alarms, dashboards, canaries, and queries. Proper retention and metric selection strategies significantly lower costs.

Cost Optimization Tips

  • Use log retention policies
  • Export long-term logs to S3
  • Avoid unnecessary high-resolution metrics
  • Centralize logs using subscription filters

CloudWatch in DevOps and SRE

CloudWatch integrates deeply with DevOps tooling for monitoring CI/CD pipelines, deployments, scaling, and issue detection. SRE teams rely on CloudWatch for SLIs, SLO tracking, error budgeting, and incident response.

Best Practices for Using CloudWatch

  • Use structured JSON logging
  • Enable detailed monitoring when needed
  • Use Infrastructure-as-Code for alarm management
  • Enable cross-account monitoring
  • Use anomaly detection for dynamic workloads
  • Optimize log retention to reduce cost

Common Interview Questions on CloudWatch

  1. Difference between CloudWatch Logs and CloudTrail?
  2. How does CloudWatch Agent work?
  3. What is anomaly detection?
  4. How do Metric Streams work?
  5. What is the use of CloudWatch Dashboards?

Amazon CloudWatch is a comprehensive monitoring and observability platform for AWS, hybrid, and on-premise environments. It provides metrics, logs, alarms, dashboards, traces, canaries, anomaly detection, and automated actions. With its wide feature set, CloudWatch helps ensure high availability, performance efficiency, cost optimization, and system reliability for cloud-native architectures.

Related Tutorials

Frequently Asked Questions for AWS

An AWS Region is a geographical area with multiple isolated availability zones. Regions ensure high availability, fault tolerance, and data redundancy.

AWS EBS (Elastic Block Store) provides block-level storage for use with EC2 instances. It's ideal for databases and other performance-intensive applications.



  • S3: Object storage for unstructured data.
  • EBS: Block storage for structured data like databases.

  • Regions are geographic areas.
  • Availability Zones are isolated data centers within a region, providing high availability for your applications.

AWS pricing follows a pay-as-you-go model. You pay only for the resources you use, with options like on-demand instances, reserved instances, and spot instances to optimize costs.



AWS S3 (Simple Storage Service) is an object storage service used to store and retrieve any amount of data from anywhere. It's ideal for backup, data archiving, and big data analytics.



Amazon RDS (Relational Database Service) is a managed database service supporting engines like MySQL, PostgreSQL, Oracle, and SQL Server. It automates tasks like backups and updates.



  • Scalability: Resources scale based on demand.
  • Cost-efficiency: Pay-as-you-go pricing.
  • Global Reach: Availability in multiple regions.
  • Security: Advanced encryption and compliance.
  • Flexibility: Supports various workloads and integrations.

AWS Auto Scaling automatically adjusts the number of compute resources based on demand, ensuring optimal performance and cost-efficiency.

The key AWS services include:


  • EC2 (Elastic Compute Cloud) for scalable computing.
  • S3 (Simple Storage Service) for storage.
  • RDS (Relational Database Service) for databases.
  • Lambda for serverless computing.
  • CloudFront for content delivery.

AWS CLI (Command Line Interface) is a tool for managing AWS services via commands. It provides scripting capabilities for automation.

Amazon EC2 is a web service that provides resizable compute capacity in the cloud. It enables you to launch virtual servers and manage your computing resources efficiently.

AWS Snowball is a physical device used for data migration. It allows organizations to transfer large amounts of data into AWS quickly and securely.

AWS CloudWatch is a monitoring service that collects and tracks metrics, logs, and events, helping you gain insights into your AWS infrastructure and applications.



AWS (Amazon Web Services) is a comprehensive cloud computing platform provided by Amazon. It offers on-demand cloud services such as compute power, storage, databases, networking, and more.



Elastic Load Balancer (ELB) automatically distributes incoming traffic across multiple targets (e.g., EC2 instances) to ensure high availability and fault tolerance.

Amazon VPC (Virtual Private Cloud) allows you to create a secure, isolated network within the AWS cloud, enabling you to control IP ranges, subnets, and route tables.



Route 53 is a scalable DNS (Domain Name System) web service by AWS. It connects user requests to your applications hosted on AWS resources.

AWS CloudFormation is a service that enables you to manage and provision AWS resources using infrastructure as code. It automates resource deployment through JSON or YAML templates.



AWS IAM (Identity and Access Management) allows you to control access to AWS resources securely. You can define user roles, permissions, and policies to ensure security and compliance.



  • EC2: Provides virtual servers for full control of your applications.
  • Lambda: Offers serverless computing, automatically running your code in response to events without managing servers.

Elastic Beanstalk is a PaaS (Platform as a Service) offering by AWS. It simplifies deploying and managing applications by automatically handling infrastructure provisioning and scaling.



Amazon SQS (Simple Queue Service) is a fully managed message queuing service that decouples and scales distributed systems.

AWS ensures data security through encryption (both at rest and in transit), compliance with standards (e.g., ISO, SOC, GDPR), and access controls using IAM.

AWS Lambda is a serverless computing service that lets you run code in response to events without provisioning or managing servers. You pay only for the compute time consumed.



AWS Identity and Access Management controls user access and permissions securely.

A serverless compute service running code automatically in response to events.

A Virtual Private Cloud for isolated AWS network configuration and control.

Automates resource provisioning using infrastructure as code in AWS.

A monitoring tool for AWS resources and applications, providing logs and metrics.

A virtual server for running applications on AWS with scalable compute capacity.

Distributes incoming traffic across multiple targets to ensure fault tolerance.

A scalable object storage service for backups, data archiving, and big data.

EC2, S3, RDS, Lambda, VPC, IAM, CloudWatch, DynamoDB, CloudFront, and ECS.

Tracks user activity and API usage across AWS infrastructure for auditing.

A managed relational database service supporting multiple engines like MySQL, PostgreSQL, and Oracle.

An isolated data center within a region, offering high availability and fault tolerance.

A scalable Domain Name System (DNS) web service for domain management.

Simple Notification Service sends messages or notifications to subscribers or other applications.

Brings native AWS services to on-premises locations for hybrid cloud deployments.

Automatically adjusts compute capacity to maintain performance and reduce costs.

Amazon Machine Image contains configuration information to launch EC2 instances.

Elastic Block Store provides block-level storage for use with EC2 instances.

Simple Queue Service enables decoupling and message queuing between microservices.

A serverless compute engine for containers running on ECS or EKS.

Manages and groups multiple AWS accounts centrally for billing and access control.

Distributes incoming traffic across multiple EC2 instances for better performance.

A tool for visualizing, understanding, and managing AWS costs and usage over time.

line

Copyrights © 2024 letsupdateskills All rights reserved