1 00:00:00,200 --> 00:00:01,540 So let's talk about Health Checks 2 00:00:01,540 --> 00:00:02,940 in Route 53. 3 00:00:02,940 --> 00:00:05,140 So health checks are a way for you to check 4 00:00:05,140 --> 00:00:07,750 the health of mainly public resources, 5 00:00:07,750 --> 00:00:09,040 although there's a way for us to do it 6 00:00:09,040 --> 00:00:11,640 for private resources as well, as we'll see in this lecture. 7 00:00:11,640 --> 00:00:12,640 So the idea is that, for example, 8 00:00:12,640 --> 00:00:15,590 we have two Load balancers in different regions 9 00:00:15,590 --> 00:00:17,920 and they're public load balancers, okay? 10 00:00:17,920 --> 00:00:18,753 And behind the scenes, 11 00:00:18,753 --> 00:00:20,670 we have our application running in both of them. 12 00:00:20,670 --> 00:00:22,670 So we're running into a multi-region setup 13 00:00:22,670 --> 00:00:24,690 because we want high availability, and so on, 14 00:00:24,690 --> 00:00:25,940 at the region level. 15 00:00:25,940 --> 00:00:29,690 Then we're going to use Route 53 to create DNS records. 16 00:00:29,690 --> 00:00:32,051 So that's when users access our URL, for example, 17 00:00:32,051 --> 00:00:35,860 mydomain.com, then they get redirected to, for example, 18 00:00:35,860 --> 00:00:38,040 the closest load balancer they have. 19 00:00:38,040 --> 00:00:41,510 So this would be the case with a latency type of record. 20 00:00:41,510 --> 00:00:44,000 But we want to make sure that, if one region is down, 21 00:00:44,000 --> 00:00:46,340 then we don't send our users to that region, 22 00:00:46,340 --> 00:00:47,410 obviously, right? 23 00:00:47,410 --> 00:00:48,330 So to do so, 24 00:00:48,330 --> 00:00:50,970 we're going to create health checks from Route 53. 25 00:00:50,970 --> 00:00:53,990 So we'll create health checks on the one in us-east-1, 26 00:00:53,990 --> 00:00:56,210 and we will create a health check on our instance 27 00:00:56,210 --> 00:00:58,530 in eu-west-1. 28 00:00:58,530 --> 00:00:59,930 Well, with these two health checks, 29 00:00:59,930 --> 00:01:01,860 we're going to be able to associate them 30 00:01:01,860 --> 00:01:04,420 with our Route 53 records. 31 00:01:04,420 --> 00:01:08,120 And the reason we do so is to get automated DNS failover. 32 00:01:08,120 --> 00:01:10,590 So we have three health checks that are possible. 33 00:01:10,590 --> 00:01:12,150 The ones I just showed you, which are the health check 34 00:01:12,150 --> 00:01:14,160 that monitor an endpoint, which is a public endpoint. 35 00:01:14,160 --> 00:01:16,450 So it could be an application, a server, 36 00:01:16,450 --> 00:01:18,070 or another AWS resource. 37 00:01:18,070 --> 00:01:18,920 It could be a health check 38 00:01:18,920 --> 00:01:20,640 that monitors other health checks, 39 00:01:20,640 --> 00:01:22,850 also called a calculated health check, 40 00:01:22,850 --> 00:01:23,790 or it could be a health check 41 00:01:23,790 --> 00:01:25,550 that monitors a CloudWatch Alarm, 42 00:01:25,550 --> 00:01:27,950 which gives you more control and is helpful for private 43 00:01:27,950 --> 00:01:30,070 resources as we'll see in this lecture. 44 00:01:30,070 --> 00:01:32,430 Finally, these health checks have their own metric 45 00:01:32,430 --> 00:01:35,290 and you can view them in CloudWatch metrics as well. 46 00:01:35,290 --> 00:01:37,280 So let's look at how health checks work 47 00:01:37,280 --> 00:01:38,260 with a specific endpoint. 48 00:01:38,260 --> 00:01:41,860 So if we have a health check for eu-west-1, for an ALB, 49 00:01:41,860 --> 00:01:44,140 then the health checkers of AWS 50 00:01:44,140 --> 00:01:45,980 are coming from all around the world. 51 00:01:45,980 --> 00:01:47,440 So it's not just one health checker. 52 00:01:47,440 --> 00:01:49,940 It's about 15 health checkers from all around the world. 53 00:01:49,940 --> 00:01:51,580 And they're all going to send requests 54 00:01:51,580 --> 00:01:55,020 into our public endpoint to wherever routes we set. 55 00:01:55,020 --> 00:01:58,950 And then if it gets 200 OK code back or the code we defined, 56 00:01:58,950 --> 00:02:01,140 then the resource is deemed healthy. 57 00:02:01,140 --> 00:02:02,930 So about 15 global health checkers 58 00:02:02,930 --> 00:02:04,310 will check the endpoint health, 59 00:02:04,310 --> 00:02:07,360 and then you can set a threshold for healthy or unhealthy. 60 00:02:07,360 --> 00:02:08,259 You can set an interval, 61 00:02:08,259 --> 00:02:09,630 so we have two options. 62 00:02:09,630 --> 00:02:12,210 It could be either 30 seconds for regular health checks 63 00:02:12,210 --> 00:02:14,390 or every 10 seconds, which is a higher cost, 64 00:02:14,390 --> 00:02:16,490 which is what's called a fast health check. 65 00:02:16,490 --> 00:02:20,860 It supports many protocols, so HTTP, and HTTPS, and TCP. 66 00:02:20,860 --> 00:02:24,400 And the rule is that if over 18% of the health checkers 67 00:02:24,400 --> 00:02:26,250 say that the endpoint is healthy, 68 00:02:26,250 --> 00:02:28,500 then Route 53 will consider it healthy, 69 00:02:28,500 --> 00:02:30,670 otherwise it's deemed unhealthy. 70 00:02:30,670 --> 00:02:31,760 And you have the ability to choose 71 00:02:31,760 --> 00:02:34,380 which locations you want to use for the health checks. 72 00:02:34,380 --> 00:02:36,770 Now the health checks will only pass if you have the status 73 00:02:36,770 --> 00:02:40,537 2xx or 3xx status code back from the load balancer 74 00:02:40,537 --> 00:02:42,660 and the health check has a cool capability. 75 00:02:42,660 --> 00:02:45,570 So if it is a text-based response, 76 00:02:45,570 --> 00:02:50,473 then the health checkers can check the first 5,120 bytes 77 00:02:50,473 --> 00:02:52,160 of the response to look for some specific texts 78 00:02:52,160 --> 00:02:53,910 in the response itself. 79 00:02:53,910 --> 00:02:56,400 Finally, very important from a network perspective, 80 00:02:56,400 --> 00:02:58,970 if you want for it to work, obviously, 81 00:02:58,970 --> 00:03:01,880 the health checkers must be able to access your 82 00:03:01,880 --> 00:03:04,340 Application Balancer or whatever endpoints you have. 83 00:03:04,340 --> 00:03:06,710 And so therefore you must allow incoming requests 84 00:03:06,710 --> 00:03:09,730 coming from the Route 53 health checkers' IP address range. 85 00:03:09,730 --> 00:03:12,310 And you can find this address range at the URL 86 00:03:12,310 --> 00:03:14,840 in the bottom right of the screen. 87 00:03:14,840 --> 00:03:16,550 Now the second type of health checks we have 88 00:03:16,550 --> 00:03:18,430 are calculated health checks. 89 00:03:18,430 --> 00:03:20,100 And so this is to combine the results 90 00:03:20,100 --> 00:03:22,450 of multiple health checks into a single health check. 91 00:03:22,450 --> 00:03:24,160 And so if you look at Route 53, 92 00:03:24,160 --> 00:03:25,320 with three EC2 instance, 93 00:03:25,320 --> 00:03:27,150 we can create three health checks. 94 00:03:27,150 --> 00:03:28,560 They're all going to be children health check, 95 00:03:28,560 --> 00:03:31,770 and they can all monitor each EC2 instance one by one. 96 00:03:31,770 --> 00:03:33,930 And then we can define a parent health check, 97 00:03:33,930 --> 00:03:35,410 which is going to be defined 98 00:03:35,410 --> 00:03:38,110 on all these child health checks. 99 00:03:38,110 --> 00:03:40,360 And so the conditions to combine all these health checks 100 00:03:40,360 --> 00:03:43,270 could be an OR, an AND, or a NOT. 101 00:03:43,270 --> 00:03:47,120 You can monitor up to 256 child health checks, 102 00:03:47,120 --> 00:03:49,240 and you can specify how many of the health checks 103 00:03:49,240 --> 00:03:51,790 need to pass to make the parent pass. 104 00:03:51,790 --> 00:03:53,061 So the use case for this, 105 00:03:53,061 --> 00:03:54,660 for example, if you want to have 106 00:03:54,660 --> 00:03:56,530 a parent health check to perform maintenance 107 00:03:56,530 --> 00:03:58,110 on your website without causing 108 00:03:58,110 --> 00:04:00,160 all the health checks to fail. 109 00:04:00,160 --> 00:04:03,700 And so how do we monitor the health of a private resource? 110 00:04:03,700 --> 00:04:06,897 So in case you want to monitor something private, 111 00:04:06,897 --> 00:04:08,030 it's going to be difficult because 112 00:04:08,030 --> 00:04:09,930 while all the Route 53 health checkers 113 00:04:09,930 --> 00:04:12,800 live on the public web, they're outside of your VPC, 114 00:04:12,800 --> 00:04:14,710 so they cannot access private endpoints. 115 00:04:14,710 --> 00:04:18,019 So if it's a private VPC or an on-premises resource. 116 00:04:18,019 --> 00:04:19,860 And so the way we can do it, though, 117 00:04:19,860 --> 00:04:21,930 is to create a CloudWatch Metric 118 00:04:21,930 --> 00:04:24,200 and assign a CloudWatch Alarm on it. 119 00:04:24,200 --> 00:04:25,960 And then you can assign the CloudWatch Alarm 120 00:04:25,960 --> 00:04:27,220 into the health checker. 121 00:04:27,220 --> 00:04:28,700 So the idea is that we're going to monitor 122 00:04:28,700 --> 00:04:31,332 the health of our EC2 instance in a private subnet 123 00:04:31,332 --> 00:04:32,750 with a CloudWatch Metric. 124 00:04:32,750 --> 00:04:34,810 And then if the metric is breached, 125 00:04:34,810 --> 00:04:37,230 we're going to create a CloudWatch Alarm on it. 126 00:04:37,230 --> 00:04:39,810 And when the alarm goes into the alarm state, 127 00:04:39,810 --> 00:04:41,500 then the health checker is going to be 128 00:04:41,500 --> 00:04:43,100 automatically unhealthy 129 00:04:43,100 --> 00:04:45,310 and therefore will have created exactly what we want, 130 00:04:45,310 --> 00:04:48,140 which is a health check on a private resource, 131 00:04:48,140 --> 00:04:50,460 which is the most common use case on how to do it. 132 00:04:50,460 --> 00:04:51,770 So that's it for this lecture. 133 00:04:51,770 --> 00:04:54,720 I hope you liked it and I will see you in the next lecture.