1
00:00:00,560 --> 00:00:02,280
‫Now let's talk about consumers

2
00:00:02,280 --> 00:00:03,930
‫with Kinesis Data Streams.

3
00:00:03,930 --> 00:00:05,410
‫So the consumers get records

4
00:00:05,410 --> 00:00:07,820
‫from the stream and they process them.

5
00:00:07,820 --> 00:00:10,080
‫There is the consumers can be Lambda functions

6
00:00:10,080 --> 00:00:12,900
‫they could be Kinesis Data Analytics can be a Firehose

7
00:00:12,900 --> 00:00:16,680
‫a custom consumer using the SDK in two different modes

8
00:00:16,680 --> 00:00:21,170
‫classic or enhanced fan-outs and the Kinesis Client Library

9
00:00:21,170 --> 00:00:22,760
‫which is a library used to simplify reading

10
00:00:22,760 --> 00:00:23,950
‫from a data stream.

11
00:00:23,950 --> 00:00:25,570
‫I have a dedicated lecture by the way

12
00:00:25,570 --> 00:00:27,900
‫on the Kinesis Client Library.

13
00:00:27,900 --> 00:00:32,200
‫So let's discuss the difference between a classic shared

14
00:00:32,200 --> 00:00:35,890
‫fan-out consumer and an enhanced consumer.

15
00:00:35,890 --> 00:00:37,550
‫So when you write your consumer

16
00:00:37,550 --> 00:00:40,400
‫in the classic shared throughput way

17
00:00:40,400 --> 00:00:41,730
‫you have your Kinesis Data Stream

18
00:00:41,730 --> 00:00:45,210
‫with a lot of shards and you get two megabytes per second

19
00:00:45,210 --> 00:00:47,780
‫per shard across all consumers.

20
00:00:47,780 --> 00:00:50,250
‫That means that if you look at shard, number one

21
00:00:50,250 --> 00:00:51,770
‫just for this example

22
00:00:51,770 --> 00:00:54,030
‫we can write a Consumer Application A that is

23
00:00:54,030 --> 00:00:57,580
‫going to issue a get records API call to get records

24
00:00:57,580 --> 00:00:59,890
‫from the shard one, but as possible

25
00:00:59,890 --> 00:01:01,670
‫for us to have many different applications reading

26
00:01:01,670 --> 00:01:03,780
‫from the same Kinesis Data Stream.

27
00:01:03,780 --> 00:01:07,380
‫So consumer Application B will also issue get records

28
00:01:07,380 --> 00:01:11,350
‫API calls, and the Consumer Application C will also

29
00:01:11,350 --> 00:01:14,440
‫issue get records across all consumers.

30
00:01:14,440 --> 00:01:16,830
‫Now, what happens in this instance is

31
00:01:16,830 --> 00:01:20,410
‫that they are all sharing two megabytes per second

32
00:01:20,410 --> 00:01:22,670
‫per shard across all consumers.

33
00:01:22,670 --> 00:01:24,320
‫That means that in this instance

34
00:01:24,320 --> 00:01:27,350
‫we have three consumers sharing, two megabytes per second.

35
00:01:27,350 --> 00:01:30,150
‫That means that each consumer can get a maximum

36
00:01:30,150 --> 00:01:35,090
‫of approximately 666 kilobytes per second of data.

37
00:01:35,090 --> 00:01:36,120
‫So as you can see, there's a limit

38
00:01:36,120 --> 00:01:37,360
‫to how many consumers we can have

39
00:01:37,360 --> 00:01:39,370
‫in the more consumer applications we add

40
00:01:39,370 --> 00:01:40,880
‫onto Kinesis Data Stream.

41
00:01:40,880 --> 00:01:44,390
‫The more we're put limitation we're going to have.

42
00:01:44,390 --> 00:01:45,660
‫So, which bring us

43
00:01:45,660 --> 00:01:49,580
‫to a new mode of consumption brought by AWS recently

44
00:01:49,580 --> 00:01:52,410
‫which is called the Enhanced Fan-Out Consumer.

45
00:01:52,410 --> 00:01:55,610
‫So in this case we get two megabytes per second,

46
00:01:55,610 --> 00:01:57,940
‫per consumer, per shard.

47
00:01:57,940 --> 00:01:59,410
‫So it's not across all consumers

48
00:01:59,410 --> 00:02:01,480
‫it's per consumer, per chart.

49
00:02:01,480 --> 00:02:03,710
‫So that means that the Consumer Application A

50
00:02:03,710 --> 00:02:07,320
‫will use a new API code called Subscribe to Shard

51
00:02:07,320 --> 00:02:10,780
‫and this will make the shard, send the data

52
00:02:10,780 --> 00:02:14,400
‫push the data into our Consumer Application A

53
00:02:14,400 --> 00:02:17,130
‫at the rate of two megabytes per second.

54
00:02:17,130 --> 00:02:20,090
‫And if the second Consumer Application B issues

55
00:02:20,090 --> 00:02:22,170
‫another subscribed to shard

56
00:02:22,170 --> 00:02:24,350
‫then this Consumer Application B as well.

57
00:02:24,350 --> 00:02:26,400
‫Will get the data pushed by the shard

58
00:02:26,400 --> 00:02:28,190
‫into the Consumer Application

59
00:02:28,190 --> 00:02:30,340
‫at the rate of two megabytes per second.

60
00:02:30,340 --> 00:02:32,280
‫And so will Consumer C.

61
00:02:32,280 --> 00:02:33,550
‫So as we can see here

62
00:02:33,550 --> 00:02:36,880
‫we have three consumer applications and we get six megabytes

63
00:02:36,880 --> 00:02:39,360
‫per second of throughputs for this one chart.

64
00:02:39,360 --> 00:02:42,000
‫So in the first model we have a pull model

65
00:02:42,000 --> 00:02:44,580
‫and the second model, we have a push model.

66
00:02:44,580 --> 00:02:46,290
‫So lets summarize, pull model

67
00:02:46,290 --> 00:02:48,810
‫for the Shared Classic Fan-out Consumer,

68
00:02:48,810 --> 00:02:50,460
‫which it gives you a low, which is good.

69
00:02:50,460 --> 00:02:52,780
‫When you have a low number of consuming applications.

70
00:02:52,780 --> 00:02:54,750
‫And the Ruth reboot is two megabytes per second

71
00:02:54,750 --> 00:02:56,800
‫per shard across all consumers.

72
00:02:56,800 --> 00:02:59,070
‫There's also limits is that per shard

73
00:02:59,070 --> 00:03:03,590
‫you get maximum five, get records, API calls per second.

74
00:03:03,590 --> 00:03:06,720
‫The latency of the API calls around 200 milliseconds.

75
00:03:06,720 --> 00:03:08,150
‫And this is something you want to use when

76
00:03:08,150 --> 00:03:10,120
‫you want to minimize costs.

77
00:03:10,120 --> 00:03:12,630
‫The consumers will pull from Kinesis directly

78
00:03:12,630 --> 00:03:15,470
‫using the get records API call and the return data

79
00:03:15,470 --> 00:03:17,120
‫can be up to 10 megabytes.

80
00:03:17,120 --> 00:03:17,953
‫Then it will float off

81
00:03:17,953 --> 00:03:20,723
‫for five seconds or up to 10,000 records.

82
00:03:21,570 --> 00:03:23,500
‫So if we use Enhance Fan-out Consumer

83
00:03:23,500 --> 00:03:24,960
‫which is a push method

84
00:03:24,960 --> 00:03:27,120
‫we can get some multiple consuming applications

85
00:03:27,120 --> 00:03:30,230
‫from the same stream and each consumer will get

86
00:03:30,230 --> 00:03:32,370
‫two megabytes per second, per shard.

87
00:03:32,370 --> 00:03:34,284
‫The latency is going to be much lower because

88
00:03:34,284 --> 00:03:37,510
‫the Shard itself will push the data into our consumer.

89
00:03:37,510 --> 00:03:40,410
‫So 70 millisecond, the cost is higher.

90
00:03:40,410 --> 00:03:43,430
‫It's a feature that will cost you more on AWS.

91
00:03:43,430 --> 00:03:44,800
‫And the data, as I said

92
00:03:44,800 --> 00:03:49,150
‫is pushed using a streaming method called HDB two.

93
00:03:49,150 --> 00:03:50,740
‫Finally, there's a soft limit

94
00:03:50,740 --> 00:03:53,670
‫of five consumer applications per data stream

95
00:03:53,670 --> 00:03:57,663
‫but you can increase this by putting in a ticket on AWS.

96
00:03:58,780 --> 00:04:00,890
‫Finally, we haven't seen Lambda in depth yet

97
00:04:00,890 --> 00:04:01,810
‫but it is a way

98
00:04:01,810 --> 00:04:04,650
‫for you to consume data without using servers.

99
00:04:04,650 --> 00:04:08,450
‫So we have Kinesis Data Stream and say has three shards.

100
00:04:08,450 --> 00:04:10,070
‫And so we're each going to learn the functions

101
00:04:10,070 --> 00:04:11,980
‫and their role will be to process records

102
00:04:11,980 --> 00:04:14,070
‫and save the record into dynamodb

103
00:04:14,070 --> 00:04:15,830
‫which is a serverless database.

104
00:04:15,830 --> 00:04:17,380
‫So the Lambda functions are going to call

105
00:04:17,380 --> 00:04:19,940
‫get batch onto the Kinesis Data Stream.

106
00:04:19,940 --> 00:04:21,950
‫And the data is going to be sent to Ireland

107
00:04:21,950 --> 00:04:25,300
‫the functions by partition key to be processed.

108
00:04:25,300 --> 00:04:28,220
‫The Lambda functions can then send data into dynamodb

109
00:04:28,220 --> 00:04:31,030
‫and we have a way to process our Kinesis Data Stream

110
00:04:31,030 --> 00:04:33,730
‫using a serverless mechanism.

111
00:04:33,730 --> 00:04:36,570
‫So in this example, Lambda functions support

112
00:04:36,570 --> 00:04:39,270
‫both the classic and enhance fan-out consumer modes.

113
00:04:39,270 --> 00:04:41,490
‫So you can say how you want to consume data

114
00:04:41,490 --> 00:04:42,420
‫from Kinesis Data Streams.

115
00:04:42,420 --> 00:04:44,310
‫It will read the record in batches.

116
00:04:44,310 --> 00:04:47,040
‫You can configure the batch size and the batch window.

117
00:04:47,040 --> 00:04:48,630
‫And in case there's an error that occurs

118
00:04:48,630 --> 00:04:51,020
‫the Lambda will retry until the succeeds

119
00:04:51,020 --> 00:04:54,740
‫or the data will be expired in Kinesis Data Stream.

120
00:04:54,740 --> 00:04:55,650
‫You can also process

121
00:04:55,650 --> 00:04:58,640
‫up to 10 batches per shard simultaneously.

122
00:04:58,640 --> 00:05:00,610
‫So we'll have a look at Kinesis at Lambda

123
00:05:00,610 --> 00:05:02,310
‫in greater detail in the Lambda function?

124
00:05:02,310 --> 00:05:03,143
‫So don't worry too much

125
00:05:03,143 --> 00:05:06,000
‫if you don't understand what I just said, but so that sets

126
00:05:06,000 --> 00:05:08,970
‫we've seen consumers in Kinesis Data Streams.

127
00:05:08,970 --> 00:05:11,410
‫Now let's go ahead and do a hands-on to really

128
00:05:11,410 --> 00:05:13,330
‫get some practice with Kinesis Data Streams.

129
00:05:13,330 --> 00:05:15,230
‫So I will see you in the next lecture.