1 00:00:00,300 --> 00:00:03,600 ‫So now let's talk about API Gateway Logging and Tracing. 2 00:00:03,600 --> 00:00:05,850 ‫So the first option is to use CloudWatch Logs. 3 00:00:05,850 --> 00:00:08,160 ‫And when you enable CloudWatch log integration 4 00:00:08,160 --> 00:00:09,420 ‫with API Gateway, 5 00:00:09,420 --> 00:00:12,060 ‫you're going to get information about the request 6 00:00:12,060 --> 00:00:13,590 ‫and the response body 7 00:00:13,590 --> 00:00:15,780 ‫that goes through the API gateway. 8 00:00:15,780 --> 00:00:17,880 ‫You can enable it at the Stage Level 9 00:00:17,880 --> 00:00:19,620 ‫and you define your Log Level, 10 00:00:19,620 --> 00:00:23,370 ‫if you want to have the ERROR logs only, or DEBUG, or INFO. 11 00:00:23,370 --> 00:00:24,900 ‫And obviously the DEBUG is going 12 00:00:24,900 --> 00:00:27,780 ‫to give you the most amount of information, 13 00:00:27,780 --> 00:00:30,960 ‫and you can override this setting on a per API basis. 14 00:00:30,960 --> 00:00:32,970 ‫So to make it dead clear, 15 00:00:32,970 --> 00:00:35,100 ‫we're going to get a user that makes a request 16 00:00:35,100 --> 00:00:36,330 ‫into the API gateway, 17 00:00:36,330 --> 00:00:38,730 ‫and automatically that request is going to be logged 18 00:00:38,730 --> 00:00:40,410 ‫into CloudWatch Logs. 19 00:00:40,410 --> 00:00:42,780 ‫Then the request is making it to your backend, 20 00:00:42,780 --> 00:00:46,260 ‫your backend will then give a response to the API gateway, 21 00:00:46,260 --> 00:00:49,050 ‫and the response again, will be sent to CloudWatch Logs 22 00:00:49,050 --> 00:00:51,780 ‫and finally, make it to the user. 23 00:00:51,780 --> 00:00:54,540 ‫So it's very helpful to get the request and the response, 24 00:00:54,540 --> 00:00:56,580 ‫but be careful if you do enable this, 25 00:00:56,580 --> 00:00:57,720 ‫then you may get a lot 26 00:00:57,720 --> 00:01:00,603 ‫of sensitive information into CloudWatch Logs. 27 00:01:01,470 --> 00:01:04,590 ‫For X-Ray, well, this is to get tracing information 28 00:01:04,590 --> 00:01:07,650 ‫about the requests that go through the API gateway. 29 00:01:07,650 --> 00:01:10,410 ‫And if you enable X-Ray for API gateway and Lambda, 30 00:01:10,410 --> 00:01:13,743 ‫that gives you of course the full picture, for your API. 31 00:01:14,730 --> 00:01:17,640 ‫Then API gateway can be monitored with CloudWatch Metrics 32 00:01:17,640 --> 00:01:20,220 ‫and their per stage, and you can enable detail metrics. 33 00:01:20,220 --> 00:01:22,620 ‫And so there's a few metrics you need to know about, 34 00:01:22,620 --> 00:01:24,330 ‫before going to go in the exam. 35 00:01:24,330 --> 00:01:26,250 ‫The first one is called CacheHitCount, 36 00:01:26,250 --> 00:01:28,860 ‫and the other one is called CacheMissCount, 37 00:01:28,860 --> 00:01:30,330 ‫which gives you some information 38 00:01:30,330 --> 00:01:33,060 ‫about the efficiency of your cache. 39 00:01:33,060 --> 00:01:34,830 ‫So if your cache is very efficient, 40 00:01:34,830 --> 00:01:36,540 ‫the cache hit will be very high. 41 00:01:36,540 --> 00:01:39,690 ‫If it's not efficient, the cache miss will be very low. 42 00:01:39,690 --> 00:01:43,590 ‫The count is a number of API requests in a given period. 43 00:01:43,590 --> 00:01:47,430 ‫The IntegrationLatency is how long the API takes to 44 00:01:47,430 --> 00:01:49,320 ‫relay a request to the backend, 45 00:01:49,320 --> 00:01:51,930 ‫and waiting for receiving a response from the backend, 46 00:01:51,930 --> 00:01:53,490 ‫so it indicates to you, 47 00:01:53,490 --> 00:01:57,240 ‫how long the backend is taking to reply to the API gateway. 48 00:01:57,240 --> 00:01:59,190 ‫And the Latency itself, 49 00:01:59,190 --> 00:02:00,930 ‫is the time between when the API gateway 50 00:02:00,930 --> 00:02:02,790 ‫receives a request from the client, 51 00:02:02,790 --> 00:02:04,800 ‫and when it returns a response to the client. 52 00:02:04,800 --> 00:02:07,560 ‫So that includes the IntegrationLatency, 53 00:02:07,560 --> 00:02:10,440 ‫but also adds anything the API gateway is doing. 54 00:02:10,440 --> 00:02:11,640 ‫That includes, for example, 55 00:02:11,640 --> 00:02:14,430 ‫checking the authorization and authentication, 56 00:02:14,430 --> 00:02:17,940 ‫checking the cache, doing some mapping templates, and so on. 57 00:02:17,940 --> 00:02:19,920 ‫So the Latency is always going to be a bit higher 58 00:02:19,920 --> 00:02:21,720 ‫than the IntegrationLatency. 59 00:02:21,720 --> 00:02:23,130 ‫And so you should note, 60 00:02:23,130 --> 00:02:25,800 ‫that the maximum amount of time that an API gateway 61 00:02:25,800 --> 00:02:28,560 ‫can perform any request is 29 seconds. 62 00:02:28,560 --> 00:02:29,610 ‫So if your Latency 63 00:02:29,610 --> 00:02:32,760 ‫or your IntegrationLatency is over 29 seconds, 64 00:02:32,760 --> 00:02:35,430 ‫that means you will see a timeout from your API gateway. 65 00:02:35,430 --> 00:02:36,390 ‫So this is two, 66 00:02:36,390 --> 00:02:38,940 ‫these are two really good metrics to look at. 67 00:02:38,940 --> 00:02:41,310 ‫Then we have two kinds of errors. 68 00:02:41,310 --> 00:02:44,040 ‫So we're getting some metrics related to these errors. 69 00:02:44,040 --> 00:02:47,550 ‫You have a 4XX metric, called client-side errors. 70 00:02:47,550 --> 00:02:49,110 ‫So this is how many errors we're getting 71 00:02:49,110 --> 00:02:50,130 ‫on the client side. 72 00:02:50,130 --> 00:02:51,810 ‫And 5XXError 73 00:02:51,810 --> 00:02:54,060 ‫which is how many errors we're getting on the server side. 74 00:02:54,060 --> 00:02:55,920 ‫So server-side means backend, 75 00:02:55,920 --> 00:02:59,430 ‫and client-side means the clients using your API gateway. 76 00:02:59,430 --> 00:03:02,580 ‫Okay, so we've seen the API gateway can do some throttling 77 00:03:02,580 --> 00:03:04,200 ‫with the usage plans and so on. 78 00:03:04,200 --> 00:03:06,510 ‫So we can define also Account Limits, 79 00:03:06,510 --> 00:03:10,050 ‫And by default your API gateway will throttle requests 80 00:03:10,050 --> 00:03:12,090 ‫at 10,000 requests per second, 81 00:03:12,090 --> 00:03:14,070 ‫across all the API's, 82 00:03:14,070 --> 00:03:17,070 ‫such as a soft limit and can be increased upon request. 83 00:03:17,070 --> 00:03:20,790 ‫So that means that if one of your API is under heavy use 84 00:03:20,790 --> 00:03:23,310 ‫the other APIs can also be throttled. 85 00:03:23,310 --> 00:03:25,500 ‫So in case you see a throttling, what will you see? 86 00:03:25,500 --> 00:03:28,890 ‫You will see an error code, 429 Too Many Requests 87 00:03:28,890 --> 00:03:30,960 ‫which is a clients error, because the clients 88 00:03:30,960 --> 00:03:32,820 ‫are doing too many requests, 89 00:03:32,820 --> 00:03:34,830 ‫and it's retriable but you should use something 90 00:03:34,830 --> 00:03:37,980 ‫like exponential back off, to retry these requests. 91 00:03:37,980 --> 00:03:41,280 ‫You can also, to improve throttling and performance 92 00:03:41,280 --> 00:03:43,710 ‫set stage limits and method limits 93 00:03:43,710 --> 00:03:44,910 ‫to make sure that each stage 94 00:03:44,910 --> 00:03:47,430 ‫does not use all the quotas of the request, 95 00:03:47,430 --> 00:03:48,960 ‫if it's under attack. 96 00:03:48,960 --> 00:03:51,870 ‫Or, we've seen before we can define a usage plan, 97 00:03:51,870 --> 00:03:54,510 ‫if we want to be able to throttle, per customer. 98 00:03:54,510 --> 00:03:57,870 ‫So just like Lambda Concurrency, if one API is overloaded, 99 00:03:57,870 --> 00:03:58,920 ‫if not limited, 100 00:03:58,920 --> 00:04:01,140 ‫it can cause other APIs to be throttled. 101 00:04:01,140 --> 00:04:01,973 ‫So we've seen this 102 00:04:01,973 --> 00:04:04,350 ‫with Lambda Reserve Concurrency and concurrency overall 103 00:04:04,350 --> 00:04:07,200 ‫but this is also applicable to API Gateway. 104 00:04:07,200 --> 00:04:08,970 ‫Okay, finally, let's talk 105 00:04:08,970 --> 00:04:11,040 ‫about the errors we can see in the API gateway. 106 00:04:11,040 --> 00:04:13,170 ‫So 4xx means Client errors, 107 00:04:13,170 --> 00:04:15,240 ‫So the clients using your API Gateway. 108 00:04:15,240 --> 00:04:18,462 ‫This could be 400:Bad Request 109 00:04:18,462 --> 00:04:22,650 ‫403:Access Denied, or the web application firewall 110 00:04:22,650 --> 00:04:24,420 ‫did not accept your request. 111 00:04:24,420 --> 00:04:27,180 ‫429, for example, if your quota has been exceeded 112 00:04:27,180 --> 00:04:29,100 ‫and you're seeing some throttling. 113 00:04:29,100 --> 00:04:33,210 ‫And anything with 5XX, means Server errors, so your backend. 114 00:04:33,210 --> 00:04:35,310 ‫So 502 means that for example, 115 00:04:35,310 --> 00:04:38,220 ‫your Lambda proxy integration did not respond well, 116 00:04:38,220 --> 00:04:42,120 ‫or 503 is that your backend is unavailable. 117 00:04:42,120 --> 00:04:45,060 ‫Or 504 is that there was an Integration Failure, 118 00:04:45,060 --> 00:04:46,530 ‫and one of these failures is that 119 00:04:46,530 --> 00:04:51,420 ‫the API gateway did request timeout and after 29 seconds 120 00:04:51,420 --> 00:04:53,220 ‫did not receive a request from the backend 121 00:04:53,220 --> 00:04:56,520 ‫and therefore we returned a 504:Integration Failure, 122 00:04:56,520 --> 00:04:58,140 ‫due to this timeout. 123 00:04:58,140 --> 00:05:00,510 ‫So that's it for the API Gateway Monitoring. 124 00:05:00,510 --> 00:05:03,510 ‫I hope you liked it, and I will see you in the next lecture.