1
00:00:00,580 --> 00:00:01,413
‫So now let's talk

2
00:00:01,413 --> 00:00:03,610
‫about Lambda concurrency and throttling.

3
00:00:03,610 --> 00:00:06,130
‫So the more we invoke our Lambda functions,

4
00:00:06,130 --> 00:00:08,130
‫the more we will have concurrent executions

5
00:00:08,130 --> 00:00:08,963
‫of our Lambda functions.

6
00:00:08,963 --> 00:00:10,920
‫We know this because Lambda can scale

7
00:00:10,920 --> 00:00:13,000
‫very, very easily and fast.

8
00:00:13,000 --> 00:00:15,500
‫So that means that's if we invoke our Lambda function

9
00:00:15,500 --> 00:00:18,110
‫at a low scale, we may have two concurrent executions

10
00:00:18,110 --> 00:00:19,320
‫of our Lambda functions.

11
00:00:19,320 --> 00:00:21,840
‫But if we have a very high scale of events happening,

12
00:00:21,840 --> 00:00:25,490
‫we may have up to 1000 concurrence of Lambda functions

13
00:00:25,490 --> 00:00:28,450
‫working together to process whatever comes through.

14
00:00:28,450 --> 00:00:31,550
‫So something can do though is to limit the number

15
00:00:31,550 --> 00:00:34,030
‫of concurrent execution a Lambda function can do,

16
00:00:34,030 --> 00:00:35,290
‫and that is recommended.

17
00:00:35,290 --> 00:00:38,410
‫So for this, we can set what's called a reserved concurrency

18
00:00:38,410 --> 00:00:40,480
‫and that is set at the function level.

19
00:00:40,480 --> 00:00:42,217
‫So this is a limit, and we're saying,

20
00:00:42,217 --> 00:00:44,387
‫"Okay, this Lambda function can only have

21
00:00:44,387 --> 00:00:47,320
‫"up to 50 concurrent executions."

22
00:00:47,320 --> 00:00:50,880
‫So each invocation over the concurrency limits

23
00:00:50,880 --> 00:00:53,240
‫will trigger what's called a throttle.

24
00:00:53,240 --> 00:00:55,010
‫And there are different behaviors with a throttle

25
00:00:55,010 --> 00:00:56,640
‫if it's a synchronous invocation.

26
00:00:56,640 --> 00:00:58,900
‫So we invoke our Lambda functions directly

27
00:00:58,900 --> 00:01:00,310
‫and we're being throttled,

28
00:01:00,310 --> 00:01:03,930
‫it will return a throttle error, 429.

29
00:01:03,930 --> 00:01:06,190
‫And if it's an asynchronous invocation,

30
00:01:06,190 --> 00:01:09,790
‫it will retry automatically and then go to the DLQ.

31
00:01:09,790 --> 00:01:11,820
‫So in case you need a higher

32
00:01:11,820 --> 00:01:14,270
‫than 1000 concurrent executions at a time,

33
00:01:14,270 --> 00:01:16,650
‫you can just open a support ticket

34
00:01:16,650 --> 00:01:18,930
‫to request a higher limits.

35
00:01:18,930 --> 00:01:21,390
‫So now that we know about the concept of currency,

36
00:01:21,390 --> 00:01:23,890
‫here is something that can happen

37
00:01:23,890 --> 00:01:26,810
‫if we don't set the concurrently very carefully.

38
00:01:26,810 --> 00:01:29,340
‫So if you don't set any reserve concurrency,

39
00:01:29,340 --> 00:01:31,640
‫so any limit on your function concurrency,

40
00:01:31,640 --> 00:01:33,430
‫then this could happen.

41
00:01:33,430 --> 00:01:34,880
‫So we have our application balancer,

42
00:01:34,880 --> 00:01:37,000
‫for example, connected to a Lambda function.

43
00:01:37,000 --> 00:01:40,220
‫We have another application where we have few users

44
00:01:40,220 --> 00:01:41,540
‫that connect to an API gateway,

45
00:01:41,540 --> 00:01:43,500
‫connected to another Lambda function,

46
00:01:43,500 --> 00:01:46,730
‫and one last application may be using the SDK and the CLI

47
00:01:46,730 --> 00:01:48,510
‫to invoke a Lambda function.

48
00:01:48,510 --> 00:01:50,740
‫So when everything is pretty low-level,

49
00:01:50,740 --> 00:01:53,350
‫like low throughput of invocation,

50
00:01:53,350 --> 00:01:54,750
‫everything is fine.

51
00:01:54,750 --> 00:01:57,710
‫But let's say that we are running a huge promotion

52
00:01:57,710 --> 00:02:01,050
‫and somehow we get many, many users hammering

53
00:02:01,050 --> 00:02:04,120
‫our application load balancers, we're very successful.

54
00:02:04,120 --> 00:02:05,750
‫So what happens is that our load balancer

55
00:02:05,750 --> 00:02:08,360
‫will be invoking many, many Lambda functions

56
00:02:08,360 --> 00:02:10,350
‫and Lambda functions can scale automatically.

57
00:02:10,350 --> 00:02:13,310
‫So we'll get up to 1000 concurrent executions.

58
00:02:13,310 --> 00:02:14,143
‫So this looks good, right?

59
00:02:14,143 --> 00:02:15,240
‫Lambda has scaled.

60
00:02:15,240 --> 00:02:16,900
‫But here is the problem.

61
00:02:16,900 --> 00:02:18,610
‫All of the concurrent executions

62
00:02:18,610 --> 00:02:20,580
‫went to the first application.

63
00:02:20,580 --> 00:02:22,150
‫So that means that the application users

64
00:02:22,150 --> 00:02:24,450
‫of our API gateway will be throttled.

65
00:02:24,450 --> 00:02:28,760
‫And that means that the CLI and SDK will also be throttled.

66
00:02:28,760 --> 00:02:30,430
‫So what you get to remember out of this slide

67
00:02:30,430 --> 00:02:32,980
‫is that the concurrency limit applies

68
00:02:32,980 --> 00:02:34,770
‫to all the functions in your accounts,

69
00:02:34,770 --> 00:02:36,350
‫and so you have to be careful because

70
00:02:36,350 --> 00:02:38,180
‫if one function goes over the limit,

71
00:02:38,180 --> 00:02:40,780
‫it's possible that your other functions get throttled.

72
00:02:40,780 --> 00:02:42,340
‫So that's very, very important.

73
00:02:42,340 --> 00:02:44,350
‫Next, let's talk about concurrency

74
00:02:44,350 --> 00:02:46,530
‫and your asynchronous invocations.

75
00:02:46,530 --> 00:02:49,580
‫So let's take the example of S3 event notifications.

76
00:02:49,580 --> 00:02:51,760
‫So we are uploading files into our S3 buckets,

77
00:02:51,760 --> 00:02:53,020
‫and this creates a new file event

78
00:02:53,020 --> 00:02:54,750
‫that will invoke our Lambda functions,

79
00:02:54,750 --> 00:02:57,540
‫and say we are putting many, many files at the same time.

80
00:02:57,540 --> 00:02:59,710
‫So we get many, many different Lambda

81
00:02:59,710 --> 00:03:01,630
‫concurrent executions happening.

82
00:03:01,630 --> 00:03:04,160
‫And if the function doesn't have enough

83
00:03:04,160 --> 00:03:04,993
‫concurrency available.

84
00:03:04,993 --> 00:03:07,670
‫So if it cannot scale up because we have reached the limits,

85
00:03:07,670 --> 00:03:10,530
‫then the additional requests are throttled.

86
00:03:10,530 --> 00:03:12,860
‫But this is an asynchronous request.

87
00:03:12,860 --> 00:03:15,180
‫So for any throttling errors and system error,

88
00:03:15,180 --> 00:03:17,850
‫so 429 and 500-series,

89
00:03:17,850 --> 00:03:20,670
‫Lambda will return the event to the event queue.

90
00:03:20,670 --> 00:03:22,690
‫So remember in the asynchronous mode

91
00:03:22,690 --> 00:03:24,260
‫there is an internal event queue,

92
00:03:24,260 --> 00:03:27,320
‫and Lambda will attempt to run the function again

93
00:03:27,320 --> 00:03:29,070
‫for up to six hours.

94
00:03:29,070 --> 00:03:30,400
‫So there's a lot of retries that happens

95
00:03:30,400 --> 00:03:32,860
‫due to the throttling and so on.

96
00:03:32,860 --> 00:03:35,430
‫Then this retry interval will increase

97
00:03:35,430 --> 00:03:37,400
‫in an exponential bucket fashion.

98
00:03:37,400 --> 00:03:41,870
‫So from one second to our maximum of every five minutes.

99
00:03:41,870 --> 00:03:44,360
‫So this allows your Lambda functions to keep on retrying

100
00:03:44,360 --> 00:03:47,670
‫and hopefully one day find the concurrency

101
00:03:47,670 --> 00:03:51,280
‫and capacity available to run correctly.

102
00:03:51,280 --> 00:03:52,850
‫Okay, so next let's talk

103
00:03:52,850 --> 00:03:55,370
‫about cold starts and provisioned concurrency.

104
00:03:55,370 --> 00:03:57,300
‫So you may have heard the term before, if you use Lambda.

105
00:03:57,300 --> 00:03:59,430
‫So cold start, it means that when you create

106
00:03:59,430 --> 00:04:01,250
‫a new Lambda function instance,

107
00:04:01,250 --> 00:04:04,210
‫your code has to be loaded and your code outside

108
00:04:04,210 --> 00:04:05,740
‫of the handler has to be run.

109
00:04:05,740 --> 00:04:08,030
‫So this corresponds to all your initialization.

110
00:04:08,030 --> 00:04:09,020
‫So in it.

111
00:04:09,020 --> 00:04:11,290
‫And if your initialization is large,

112
00:04:11,290 --> 00:04:13,580
‫because you have a lot of code, a lot of dependencies,

113
00:04:13,580 --> 00:04:16,420
‫you're connecting to many databases and creating many SDK,

114
00:04:16,420 --> 00:04:18,520
‫this process can take a lot of time.

115
00:04:18,520 --> 00:04:20,300
‫So that means that the first request

116
00:04:20,300 --> 00:04:23,170
‫served by new instances has a higher latency

117
00:04:23,170 --> 00:04:25,920
‫than the rest and that may impact your users.

118
00:04:25,920 --> 00:04:28,610
‫So if your user is maybe waiting three seconds

119
00:04:28,610 --> 00:04:30,750
‫to get a request response,

120
00:04:30,750 --> 00:04:32,600
‫that may be very, very slow for them

121
00:04:32,600 --> 00:04:34,000
‫and they may experience a cold start

122
00:04:34,000 --> 00:04:36,480
‫and may be unhappy with your product.

123
00:04:36,480 --> 00:04:38,040
‫So what can you do?

124
00:04:38,040 --> 00:04:39,010
‫Well, you can use something

125
00:04:39,010 --> 00:04:41,140
‫called a provisioned concurrency.

126
00:04:41,140 --> 00:04:43,300
‫That means that you allocate concurrency

127
00:04:43,300 --> 00:04:45,500
‫before the function is even invoked.

128
00:04:45,500 --> 00:04:47,900
‫So you allocate this concurrency in advance.

129
00:04:47,900 --> 00:04:50,140
‫This way, the cold start never happens,

130
00:04:50,140 --> 00:04:53,370
‫and all the invocations will have a lower latency.

131
00:04:53,370 --> 00:04:55,750
‫And to manage this concurrency, you can...

132
00:04:55,750 --> 00:04:57,060
‫This provisioned concurrency,

133
00:04:57,060 --> 00:04:59,100
‫you can use Application Auto Scaling.

134
00:04:59,100 --> 00:05:01,630
‫For example, for a schedule or target position

135
00:05:01,630 --> 00:05:04,630
‫to make sure that you have enough reserved Lambda functions

136
00:05:04,630 --> 00:05:08,720
‫to be ready to be used and minimize this cold start problem.

137
00:05:08,720 --> 00:05:11,610
‫So please note that whenever before you used

138
00:05:11,610 --> 00:05:14,240
‫to launch a Lambda function in a VPC

139
00:05:14,240 --> 00:05:15,630
‫that used to take forever.

140
00:05:15,630 --> 00:05:19,150
‫So now there was a blog in October and November, 2019

141
00:05:19,150 --> 00:05:20,540
‫that has been released by AWS.

142
00:05:20,540 --> 00:05:21,640
‫Here is the link.

143
00:05:21,640 --> 00:05:24,610
‫And this blog show the improvements they have done

144
00:05:24,610 --> 00:05:27,690
‫to dramatically reduce the cold starts in your VPC.

145
00:05:27,690 --> 00:05:30,210
‫So the good news is if you were using Lambda before

146
00:05:30,210 --> 00:05:33,990
‫the cold starts, really have a minimal impact on your VPC.

147
00:05:33,990 --> 00:05:36,470
‫Okay, finally, there's two diagrams you can look

148
00:05:36,470 --> 00:05:38,760
‫in your own time to look at the concept

149
00:05:38,760 --> 00:05:41,850
‫of reserved concurrency and provisioned concurrency.

150
00:05:41,850 --> 00:05:43,410
‫And this graphs, I like them.

151
00:05:43,410 --> 00:05:45,110
‫So here's the link in the slides.

152
00:05:45,110 --> 00:05:46,090
‫Have a look at them.

153
00:05:46,090 --> 00:05:47,200
‫They explain to you how they work.

154
00:05:47,200 --> 00:05:49,270
‫I think they're quite complicated to describe

155
00:05:49,270 --> 00:05:51,250
‫as is with a slide.

156
00:05:51,250 --> 00:05:53,410
‫But have a look at them in your own time,

157
00:05:53,410 --> 00:05:55,190
‫and hopefully they will help you understand

158
00:05:55,190 --> 00:05:56,550
‫this concept a little bit better

159
00:05:56,550 --> 00:05:58,400
‫if I didn't help you right now.

160
00:05:58,400 --> 00:05:59,740
‫Okay, so now let's go into the hands-on

161
00:05:59,740 --> 00:06:01,363
‫and see how concurrency works.