1
00:00:00,300 --> 00:00:03,600
‫So now let's talk about API Gateway Logging and Tracing.

2
00:00:03,600 --> 00:00:05,850
‫So the first option is to use CloudWatch Logs.

3
00:00:05,850 --> 00:00:08,160
‫And when you enable CloudWatch log integration

4
00:00:08,160 --> 00:00:09,420
‫with API Gateway,

5
00:00:09,420 --> 00:00:12,060
‫you're going to get information about the request

6
00:00:12,060 --> 00:00:13,590
‫and the response body

7
00:00:13,590 --> 00:00:15,780
‫that goes through the API gateway.

8
00:00:15,780 --> 00:00:17,880
‫You can enable it at the Stage Level

9
00:00:17,880 --> 00:00:19,620
‫and you define your Log Level,

10
00:00:19,620 --> 00:00:23,370
‫if you want to have the ERROR logs only, or DEBUG, or INFO.

11
00:00:23,370 --> 00:00:24,900
‫And obviously the DEBUG is going

12
00:00:24,900 --> 00:00:27,780
‫to give you the most amount of information,

13
00:00:27,780 --> 00:00:30,960
‫and you can override this setting on a per API basis.

14
00:00:30,960 --> 00:00:32,970
‫So to make it dead clear,

15
00:00:32,970 --> 00:00:35,100
‫we're going to get a user that makes a request

16
00:00:35,100 --> 00:00:36,330
‫into the API gateway,

17
00:00:36,330 --> 00:00:38,730
‫and automatically that request is going to be logged

18
00:00:38,730 --> 00:00:40,410
‫into CloudWatch Logs.

19
00:00:40,410 --> 00:00:42,780
‫Then the request is making it to your backend,

20
00:00:42,780 --> 00:00:46,260
‫your backend will then give a response to the API gateway,

21
00:00:46,260 --> 00:00:49,050
‫and the response again, will be sent to CloudWatch Logs

22
00:00:49,050 --> 00:00:51,780
‫and finally, make it to the user.

23
00:00:51,780 --> 00:00:54,540
‫So it's very helpful to get the request and the response,

24
00:00:54,540 --> 00:00:56,580
‫but be careful if you do enable this,

25
00:00:56,580 --> 00:00:57,720
‫then you may get a lot

26
00:00:57,720 --> 00:01:00,603
‫of sensitive information into CloudWatch Logs.

27
00:01:01,470 --> 00:01:04,590
‫For X-Ray, well, this is to get tracing information

28
00:01:04,590 --> 00:01:07,650
‫about the requests that go through the API gateway.

29
00:01:07,650 --> 00:01:10,410
‫And if you enable X-Ray for API gateway and Lambda,

30
00:01:10,410 --> 00:01:13,743
‫that gives you of course the full picture, for your API.

31
00:01:14,730 --> 00:01:17,640
‫Then API gateway can be monitored with CloudWatch Metrics

32
00:01:17,640 --> 00:01:20,220
‫and their per stage, and you can enable detail metrics.

33
00:01:20,220 --> 00:01:22,620
‫And so there's a few metrics you need to know about,

34
00:01:22,620 --> 00:01:24,330
‫before going to go in the exam.

35
00:01:24,330 --> 00:01:26,250
‫The first one is called CacheHitCount,

36
00:01:26,250 --> 00:01:28,860
‫and the other one is called CacheMissCount,

37
00:01:28,860 --> 00:01:30,330
‫which gives you some information

38
00:01:30,330 --> 00:01:33,060
‫about the efficiency of your cache.

39
00:01:33,060 --> 00:01:34,830
‫So if your cache is very efficient,

40
00:01:34,830 --> 00:01:36,540
‫the cache hit will be very high.

41
00:01:36,540 --> 00:01:39,690
‫If it's not efficient, the cache miss will be very low.

42
00:01:39,690 --> 00:01:43,590
‫The count is a number of API requests in a given period.

43
00:01:43,590 --> 00:01:47,430
‫The IntegrationLatency is how long the API takes to

44
00:01:47,430 --> 00:01:49,320
‫relay a request to the backend,

45
00:01:49,320 --> 00:01:51,930
‫and waiting for receiving a response from the backend,

46
00:01:51,930 --> 00:01:53,490
‫so it indicates to you,

47
00:01:53,490 --> 00:01:57,240
‫how long the backend is taking to reply to the API gateway.

48
00:01:57,240 --> 00:01:59,190
‫And the Latency itself,

49
00:01:59,190 --> 00:02:00,930
‫is the time between when the API gateway

50
00:02:00,930 --> 00:02:02,790
‫receives a request from the client,

51
00:02:02,790 --> 00:02:04,800
‫and when it returns a response to the client.

52
00:02:04,800 --> 00:02:07,560
‫So that includes the IntegrationLatency,

53
00:02:07,560 --> 00:02:10,440
‫but also adds anything the API gateway is doing.

54
00:02:10,440 --> 00:02:11,640
‫That includes, for example,

55
00:02:11,640 --> 00:02:14,430
‫checking the authorization and authentication,

56
00:02:14,430 --> 00:02:17,940
‫checking the cache, doing some mapping templates, and so on.

57
00:02:17,940 --> 00:02:19,920
‫So the Latency is always going to be a bit higher

58
00:02:19,920 --> 00:02:21,720
‫than the IntegrationLatency.

59
00:02:21,720 --> 00:02:23,130
‫And so you should note,

60
00:02:23,130 --> 00:02:25,800
‫that the maximum amount of time that an API gateway

61
00:02:25,800 --> 00:02:28,560
‫can perform any request is 29 seconds.

62
00:02:28,560 --> 00:02:29,610
‫So if your Latency

63
00:02:29,610 --> 00:02:32,760
‫or your IntegrationLatency is over 29 seconds,

64
00:02:32,760 --> 00:02:35,430
‫that means you will see a timeout from your API gateway.

65
00:02:35,430 --> 00:02:36,390
‫So this is two,

66
00:02:36,390 --> 00:02:38,940
‫these are two really good metrics to look at.

67
00:02:38,940 --> 00:02:41,310
‫Then we have two kinds of errors.

68
00:02:41,310 --> 00:02:44,040
‫So we're getting some metrics related to these errors.

69
00:02:44,040 --> 00:02:47,550
‫You have a 4XX metric, called client-side errors.

70
00:02:47,550 --> 00:02:49,110
‫So this is how many errors we're getting

71
00:02:49,110 --> 00:02:50,130
‫on the client side.

72
00:02:50,130 --> 00:02:51,810
‫And 5XXError

73
00:02:51,810 --> 00:02:54,060
‫which is how many errors we're getting on the server side.

74
00:02:54,060 --> 00:02:55,920
‫So server-side means backend,

75
00:02:55,920 --> 00:02:59,430
‫and client-side means the clients using your API gateway.

76
00:02:59,430 --> 00:03:02,580
‫Okay, so we've seen the API gateway can do some throttling

77
00:03:02,580 --> 00:03:04,200
‫with the usage plans and so on.

78
00:03:04,200 --> 00:03:06,510
‫So we can define also Account Limits,

79
00:03:06,510 --> 00:03:10,050
‫And by default your API gateway will throttle requests

80
00:03:10,050 --> 00:03:12,090
‫at 10,000 requests per second,

81
00:03:12,090 --> 00:03:14,070
‫across all the API's,

82
00:03:14,070 --> 00:03:17,070
‫such as a soft limit and can be increased upon request.

83
00:03:17,070 --> 00:03:20,790
‫So that means that if one of your API is under heavy use

84
00:03:20,790 --> 00:03:23,310
‫the other APIs can also be throttled.

85
00:03:23,310 --> 00:03:25,500
‫So in case you see a throttling, what will you see?

86
00:03:25,500 --> 00:03:28,890
‫You will see an error code, 429 Too Many Requests

87
00:03:28,890 --> 00:03:30,960
‫which is a clients error, because the clients

88
00:03:30,960 --> 00:03:32,820
‫are doing too many requests,

89
00:03:32,820 --> 00:03:34,830
‫and it's retriable but you should use something

90
00:03:34,830 --> 00:03:37,980
‫like exponential back off, to retry these requests.

91
00:03:37,980 --> 00:03:41,280
‫You can also, to improve throttling and performance

92
00:03:41,280 --> 00:03:43,710
‫set stage limits and method limits

93
00:03:43,710 --> 00:03:44,910
‫to make sure that each stage

94
00:03:44,910 --> 00:03:47,430
‫does not use all the quotas of the request,

95
00:03:47,430 --> 00:03:48,960
‫if it's under attack.

96
00:03:48,960 --> 00:03:51,870
‫Or, we've seen before we can define a usage plan,

97
00:03:51,870 --> 00:03:54,510
‫if we want to be able to throttle, per customer.

98
00:03:54,510 --> 00:03:57,870
‫So just like Lambda Concurrency, if one API is overloaded,

99
00:03:57,870 --> 00:03:58,920
‫if not limited,

100
00:03:58,920 --> 00:04:01,140
‫it can cause other APIs to be throttled.

101
00:04:01,140 --> 00:04:01,973
‫So we've seen this

102
00:04:01,973 --> 00:04:04,350
‫with Lambda Reserve Concurrency and concurrency overall

103
00:04:04,350 --> 00:04:07,200
‫but this is also applicable to API Gateway.

104
00:04:07,200 --> 00:04:08,970
‫Okay, finally, let's talk

105
00:04:08,970 --> 00:04:11,040
‫about the errors we can see in the API gateway.

106
00:04:11,040 --> 00:04:13,170
‫So 4xx means Client errors,

107
00:04:13,170 --> 00:04:15,240
‫So the clients using your API Gateway.

108
00:04:15,240 --> 00:04:18,462
‫This could be 400:Bad Request

109
00:04:18,462 --> 00:04:22,650
‫403:Access Denied, or the web application firewall

110
00:04:22,650 --> 00:04:24,420
‫did not accept your request.

111
00:04:24,420 --> 00:04:27,180
‫429, for example, if your quota has been exceeded

112
00:04:27,180 --> 00:04:29,100
‫and you're seeing some throttling.

113
00:04:29,100 --> 00:04:33,210
‫And anything with 5XX, means Server errors, so your backend.

114
00:04:33,210 --> 00:04:35,310
‫So 502 means that for example,

115
00:04:35,310 --> 00:04:38,220
‫your Lambda proxy integration did not respond well,

116
00:04:38,220 --> 00:04:42,120
‫or 503 is that your backend is unavailable.

117
00:04:42,120 --> 00:04:45,060
‫Or 504 is that there was an Integration Failure,

118
00:04:45,060 --> 00:04:46,530
‫and one of these failures is that

119
00:04:46,530 --> 00:04:51,420
‫the API gateway did request timeout and after 29 seconds

120
00:04:51,420 --> 00:04:53,220
‫did not receive a request from the backend

121
00:04:53,220 --> 00:04:56,520
‫and therefore we returned a 504:Integration Failure,

122
00:04:56,520 --> 00:04:58,140
‫due to this timeout.

123
00:04:58,140 --> 00:05:00,510
‫So that's it for the API Gateway Monitoring.

124
00:05:00,510 --> 00:05:03,510
‫I hope you liked it, and I will see you in the next lecture.