Uptime Monitoring

Monitor your website's availability from global locations and get instant alerts when downtime occurs.

What is Uptime Monitoring?

Uptime monitoring continuously checks if your website is accessible:

Synthetic monitoring: Regular checks from multiple locations
Real-user monitoring: Detect issues from actual visitors
Instant alerts: Know immediately when problems occur
Historical tracking: Availability percentage over time

Setting Up Uptime Monitoring

Add a Monitor

Go to Performance → Uptime
Click Add Monitor
Configure:

Monitor Configuration

Name: [Production Website]
URL: [https://example.com]

Check Settings:
  Interval: [1 minute]
  Timeout: [30 seconds]
  Method: [GET]

Expected Response:
  Status: [200]
  Contains: [optional text]

Locations:
  [x] US East (Virginia)
  [x] US West (Oregon)
  [x] Europe (Frankfurt)
  [x] Asia (Singapore)
  [x] Australia (Sydney)

Check Types

Type	Description	Use Case
HTTP(S)	Web page check	Websites, APIs
TCP	Port connectivity	Databases, services
DNS	Domain resolution	DNS health
Ping	ICMP ping	Basic connectivity
SSL	Certificate check	SSL expiry monitoring

Dashboard Overview

Uptime Status

Uptime Status

Current: ✓ All Systems Operational

Last 30 Days:
████████████████████████████████ 99.95%

Incidents: 2
Total Downtime: 23 minutes

Status by Location

Location Status

Location          Status    Latency    Last Check
US East           ✓ Up      45ms       30s ago
US West           ✓ Up      52ms       30s ago
Europe            ✓ Up      120ms      30s ago
Asia              ✓ Up      180ms      30s ago
Australia         ✓ Up      210ms      30s ago

Uptime Timeline

Last 24 Hours

00:00  ████████████████████████  ✓
04:00  ████████████████████████  ✓
08:00  ████████████████████░░░░  ⚠ Degraded (2 min)
12:00  ████████████████████████  ✓
16:00  ████████████████████████  ✓
20:00  ████████████████████████  ✓

Check Configuration

HTTP Check Options

HTTP Check Settings

Request:
  Method: GET / POST / HEAD
  URL: https://example.com/health
  Headers:
    Authorization: Bearer xxx
    User-Agent: Zenovay-Monitor/1.0

Body (for POST):
  {"ping": true}

Validation:
  Expected Status: 200-299
  Response Contains: "healthy"
  Response Time: < 5000ms

Follow Redirects: Yes
Verify SSL: Yes

TCP Check

TCP Check Settings

Host: db.example.com
Port: 5432
Timeout: 10 seconds

Expected: Connection successful

DNS Check

DNS Check Settings

Domain: example.com
Record Type: A / AAAA / CNAME / MX
Expected Value: 93.184.216.34 (optional)
DNS Server: 8.8.8.8 (or default)

SSL Certificate Check

SSL Certificate Check

Domain: example.com
Alert Before Expiry: 30 days
Verify Chain: Yes
Check OCSP: Yes

Check Intervals

Plan	Minimum Interval	Locations
Pro	1 minute	3
Scale	30 seconds	5
Enterprise	10 seconds	10

Choosing Interval

Interval	Best For
10s	Critical production systems
30s	Important services
1m	Standard monitoring
5m	Less critical services

Alerting

Alert Configuration

Uptime Alert Settings

Trigger Alert When:
  Failure Count: [2] consecutive failures
  From Locations: [Any 2 of 5]

Notify:
  [x] Email: ops@example.com
  [x] Slack: #incidents
  [x] PagerDuty: On-call
  [x] SMS: +1-555-0123

Alert On:
  [x] Site Down
  [x] SSL Expiring (30 days)
  [x] Slow Response (> 5s)
  [x] Site Recovered

Alert Message

🚨 DOWNTIME ALERT

Monitor: Production Website
URL: https://example.com
Status: DOWN

Details:
  Error: Connection timeout
  Duration: 3 minutes
  Locations Affected: US East, Europe

Timeline:
  10:15:00 - First failure (US East)
  10:15:30 - Confirmed (Europe)
  10:15:30 - Alert triggered

[View Incident →]

Recovery Alert

✓ RECOVERY

Monitor: Production Website
URL: https://example.com
Status: UP

Downtime Duration: 8 minutes
Affected Locations: US East, Europe
Recovery Time: 10:23:00

[View Incident Report →]

Incident Management

Incident Timeline

Incident #124 - Production Website

Timeline:
10:15:00  First failure detected (US East)
10:15:30  Confirmed by second location
10:15:30  Alert sent to on-call
10:16:00  Acknowledged by @john
10:20:00  Root cause: Database connection
10:23:00  Service restored
10:23:00  Recovery alert sent

Duration: 8 minutes
Impact: 2,340 users
Root Cause: Database failover

Status Page Integration

Connect to your status page:

Go to Settings → Integrations
Select status page provider:
- Statuspage.io
- Cachet
- Custom webhook
Configure auto-update rules

Maintenance Windows

Schedule Maintenance

Maintenance Window

Name: Database Upgrade
Start: 2025-01-20 02:00 UTC
End: 2025-01-20 04:00 UTC

Affected Monitors:
  [x] Production Website
  [x] API Endpoint

During Maintenance:
  [x] Pause monitoring
  [x] Suppress alerts
  [ ] Show on status page

Recurring Maintenance

Recurring Schedule

Name: Weekly Backup Window
Frequency: Every Sunday
Time: 03:00 - 04:00 UTC

Actions:
  [x] Pause monitoring
  [x] Suppress alerts

Global Monitoring Locations

Available Locations

Region	Locations
North America	Virginia, Oregon, Ohio, Montreal
Europe	Frankfurt, London, Paris, Amsterdam
Asia Pacific	Singapore, Tokyo, Sydney, Mumbai
South America	São Paulo

Location Strategy

Minimum: 3 locations for redundancy
Global sites: Use locations matching user base
Regional check: Nearby location for accurate latency

Response Time Tracking

Latency Metrics

Response Time (Last 24 Hours)

Location       Avg      P95      Max
US East        45ms     120ms    450ms
US West        52ms     135ms    520ms
Europe         120ms    280ms    890ms
Asia           180ms    420ms    1.2s

Latency Alerts

Response Time Alert

Condition: Average response > 2 seconds
Duration: 5 minutes
Locations: Any 2 of 5

This is often an early warning before full downtime.

API Endpoint Monitoring

Monitor API Health

API Health Check

URL: https://api.example.com/health
Method: GET
Headers:
  Authorization: Bearer xxx

Expected:
  Status: 200
  Response:
    {
      "status": "healthy",
      "database": "connected",
      "cache": "connected"
    }

Multi-Step Checks

Multi-Step API Check

Step 1: Login
  POST /api/auth/login
  Body: {"user": "monitor", "pass": "xxx"}
  Store: token = response.token

Step 2: Fetch Data
  GET /api/users/me
  Header: Authorization: Bearer ${token}
  Expect: Status 200

Step 3: Logout
  POST /api/auth/logout
  Header: Authorization: Bearer ${token}

Reports & Analytics

Uptime Report

Monthly Uptime Report - January 2025

Overall Uptime: 99.95%
Total Downtime: 22 minutes
Incidents: 3

Availability by Week:
  Week 1: 100.00%
  Week 2: 99.92%
  Week 3: 100.00%
  Week 4: 99.88%

Top Incidents:
  1. Database failover (8 min)
  2. CDN issue (10 min)
  3. DNS propagation (4 min)

SLA Tracking

SLA Status

Target: 99.9% (43.8 min/month allowed)
Current: 99.95% (22 min used)
Remaining: 21.8 min

Status: ✓ On Track

Export Data

Export uptime data:

CSV: Raw check results
PDF: Formatted report
API: Programmatic access

Best Practices

Monitoring Strategy

Monitor critical paths: Homepage, checkout, API
Use multiple locations: Detect regional issues
Set appropriate intervals: Balance coverage vs. cost
Define escalation: Clear alert routing
Regular review: Analyze incidents monthly

What to Monitor

Type	Examples
Public pages	Homepage, product pages
Critical flows	Checkout, signup, login
APIs	Public and internal endpoints
Infrastructure	Database, cache, CDN
Third-party	Payment provider, auth service

Alert Best Practices

Require 2+ failures before alerting
Use multiple locations to confirm
Set up escalation for unacknowledged
Include recovery notifications
Have a clear on-call rotation

Troubleshooting

False Positives

Causes:

Single location network issues
Too short timeout
Rate limiting by server
Geographic routing changes

Solutions:

Require multiple location failures
Increase timeout
Whitelist monitoring IPs
Add jitter to checks

Missing Alerts

Check:

Alert configuration enabled
Notification channels working
Not in maintenance window
Failure threshold met

Inconsistent Results

Review:

Geographic variations
Time-of-day patterns
CDN behavior
DNS resolution