1 00:00:00,270 --> 00:00:01,830 ‫Now let's discuss CloudWatch alarms. 2 00:00:01,830 --> 00:00:04,740 ‫So alarms, as we know, they're used to trigger notifications 3 00:00:04,740 --> 00:00:06,060 ‫from any metric. 4 00:00:06,060 --> 00:00:09,210 ‫And you can define complex alarms on various options 5 00:00:09,210 --> 00:00:13,200 ‫such as sampling or doing percentage or max/min and so on. 6 00:00:13,200 --> 00:00:14,490 ‫Alarm has three states. 7 00:00:14,490 --> 00:00:16,380 ‫OK means that it's not triggered. 8 00:00:16,380 --> 00:00:18,210 ‫INSUFFICIENT_DATA means that there's not enough data 9 00:00:18,210 --> 00:00:20,280 ‫for the alarm to determine a state. 10 00:00:20,280 --> 00:00:23,640 ‫And ALARM, which is that your threshold has been breached 11 00:00:23,640 --> 00:00:26,190 ‫and therefore a notification will be sent. 12 00:00:26,190 --> 00:00:27,990 ‫The period is how long you want the alarm 13 00:00:27,990 --> 00:00:29,760 ‫to evaluate for on the metric. 14 00:00:29,760 --> 00:00:32,760 ‫And so it could be very, very short or very, very long 15 00:00:32,760 --> 00:00:35,790 ‫and it can apply also to high resolution custom metrics. 16 00:00:35,790 --> 00:00:37,200 ‫For example, 10 seconds, 30 seconds, 17 00:00:37,200 --> 00:00:39,840 ‫or a multiple of 60 seconds. 18 00:00:39,840 --> 00:00:42,360 ‫Now, alarms have three main targets. 19 00:00:42,360 --> 00:00:45,210 ‫The first one is actions on EC2 instances, 20 00:00:45,210 --> 00:00:48,090 ‫such as stopping it, terminating it, rebooting it, 21 00:00:48,090 --> 00:00:50,040 ‫or recovering any instance. 22 00:00:50,040 --> 00:00:52,440 ‫The second one is to trigger an auto-scaling action. 23 00:00:52,440 --> 00:00:55,380 ‫For example, a scale out or a scale in. 24 00:00:55,380 --> 00:00:57,330 ‫And the last one is to send a notification 25 00:00:57,330 --> 00:00:59,010 ‫to the SNS service. 26 00:00:59,010 --> 00:01:00,990 ‫For example, from the SNS service, 27 00:01:00,990 --> 00:01:03,570 ‫we can hook it to a Lambda function 28 00:01:03,570 --> 00:01:06,120 ‫and have the Lambda function do pretty much anything we want 29 00:01:06,120 --> 00:01:08,400 ‫based on an alarm being breached. 30 00:01:08,400 --> 00:01:10,770 ‫So now let's talk about composite alarms. 31 00:01:10,770 --> 00:01:12,090 ‫Because we know that CloudWatch alarms 32 00:01:12,090 --> 00:01:14,040 ‫are on a single metric, 33 00:01:14,040 --> 00:01:16,260 ‫but then if you wanted to have multiple metrics, 34 00:01:16,260 --> 00:01:18,840 ‫you would need to use composite alarms 35 00:01:18,840 --> 00:01:20,700 ‫because composite alarms are actually measuring 36 00:01:20,700 --> 00:01:23,100 ‫the states of multiple other alarms 37 00:01:23,100 --> 00:01:25,260 ‫and these alarms can be each relying 38 00:01:25,260 --> 00:01:27,330 ‫on one different metric. 39 00:01:27,330 --> 00:01:30,330 ‫So the composite alarm is the action of combining 40 00:01:30,330 --> 00:01:32,370 ‫all these other alarms together. 41 00:01:32,370 --> 00:01:35,430 ‫And you can use AND or OR conditions 42 00:01:35,430 --> 00:01:38,100 ‫to be able to be very flexible 43 00:01:38,100 --> 00:01:40,440 ‫in terms of the condition you're checking for. 44 00:01:40,440 --> 00:01:42,600 ‫So it's very helpful to reduce alarm noise 45 00:01:42,600 --> 00:01:45,480 ‫because you can create complex composite alarms and saying, 46 00:01:45,480 --> 00:01:49,500 ‫for example, if the CPU is high and the network is high, 47 00:01:49,500 --> 00:01:52,230 ‫then don't alert me because I only wanna know 48 00:01:52,230 --> 00:01:54,690 ‫when the CPU is high and the network is low, 49 00:01:54,690 --> 00:01:55,860 ‫this kind of things. 50 00:01:55,860 --> 00:01:56,790 ‫So let's take an example. 51 00:01:56,790 --> 00:01:58,470 ‫We have an EC2 instance 52 00:01:58,470 --> 00:02:01,530 ‫and we're going to create a composite alarm on top of it. 53 00:02:01,530 --> 00:02:04,860 ‫So therefore, we create a first underlying alarm 54 00:02:04,860 --> 00:02:08,310 ‫called Alarm-A which is going to monitor the CPU 55 00:02:08,310 --> 00:02:10,020 ‫of the EC2 instance. 56 00:02:10,020 --> 00:02:11,910 ‫Then you create Alarm-B 57 00:02:11,910 --> 00:02:15,450 ‫which is going to monitor the IOPS of the EC2 instance. 58 00:02:15,450 --> 00:02:17,430 ‫And then the composite alarm 59 00:02:17,430 --> 00:02:21,600 ‫is defined as the junction of Alarm-A and Alarm-B. 60 00:02:21,600 --> 00:02:24,480 ‫And therefore, if Alarm-A is in ALARM 61 00:02:24,480 --> 00:02:25,890 ‫and Alarm-B is in ALARM, 62 00:02:25,890 --> 00:02:28,710 ‫and this is something we have to define ourselves, 63 00:02:28,710 --> 00:02:31,590 ‫then the composite alarm itself would be in ALARM 64 00:02:31,590 --> 00:02:34,710 ‫and can trigger, for example, an SNS notification. 65 00:02:34,710 --> 00:02:35,543 ‫So as you can see, 66 00:02:35,543 --> 00:02:39,360 ‫you can get quite creative with the composite alarms. 67 00:02:39,360 --> 00:02:40,980 ‫So let's talk about EC2 instance recovery. 68 00:02:40,980 --> 00:02:41,813 ‫We've already seen it, 69 00:02:41,813 --> 00:02:44,340 ‫but there is a status check to check the EC2 VM 70 00:02:44,340 --> 00:02:47,940 ‫and the system status check to check the underlying hardware 71 00:02:47,940 --> 00:02:49,650 ‫and you can define a CloudWatch alarm 72 00:02:49,650 --> 00:02:51,090 ‫on both of these checks, 73 00:02:51,090 --> 00:02:54,330 ‫so you will monitor a specific EC2 instance. 74 00:02:54,330 --> 00:02:56,640 ‫And in case the alarm is being breached, 75 00:02:56,640 --> 00:02:59,970 ‫then you can start an EC2 instance recovery to make sure, 76 00:02:59,970 --> 00:03:02,070 ‫for example, that you move your EC2 instance 77 00:03:02,070 --> 00:03:03,540 ‫from one host to another. 78 00:03:03,540 --> 00:03:04,950 ‫When you do a recovery, 79 00:03:04,950 --> 00:03:07,500 ‫you get the same private, public, and elastic IP, 80 00:03:07,500 --> 00:03:08,430 ‫the same metadata, 81 00:03:08,430 --> 00:03:10,560 ‫and the same placement group for your instance. 82 00:03:10,560 --> 00:03:14,010 ‫And you can also send an alert to your SNS topic 83 00:03:14,010 --> 00:03:17,673 ‫to get alerted that the EC2 instance was being recovered. 84 00:03:18,600 --> 00:03:20,370 ‫Now, the CloudWatch alarm has some good stuff to know. 85 00:03:20,370 --> 00:03:22,980 ‫First of all is that as we've seen, we can create an alarm 86 00:03:22,980 --> 00:03:24,960 ‫on top of a CloudWatch Logs metric filter. 87 00:03:24,960 --> 00:03:28,560 ‫So remember, the CloudWatch Logs are having a metric filter 88 00:03:28,560 --> 00:03:30,150 ‫which is hooked to CloudWatch alarm. 89 00:03:30,150 --> 00:03:31,440 ‫And then when we receive 90 00:03:31,440 --> 00:03:33,570 ‫too many instances of a specific word, 91 00:03:33,570 --> 00:03:35,070 ‫for example, the word error, 92 00:03:35,070 --> 00:03:39,390 ‫then do an alert and send a message into Amazon SNS. 93 00:03:39,390 --> 00:03:41,460 ‫And so if you wanted to test alarm notifications, 94 00:03:41,460 --> 00:03:45,000 ‫you can use a CLI call called set-alarm-state. 95 00:03:45,000 --> 00:03:47,280 ‫And this is helpful when you want to trigger an alarm 96 00:03:47,280 --> 00:03:50,670 ‫even though it didn't reach a specific threshold 97 00:03:50,670 --> 00:03:52,650 ‫because you wanted to see whether or not 98 00:03:52,650 --> 00:03:54,180 ‫the alarm being triggered 99 00:03:54,180 --> 00:03:57,840 ‫results in the correct action for your infrastructure. 100 00:03:57,840 --> 00:03:59,280 ‫So that's it for alarms, I hope you liked it, 101 00:03:59,280 --> 00:04:01,973 ‫and I will see you in the next lecture for some practice.