1 00:00:00,560 --> 00:00:02,280 ‫Now let's talk about consumers 2 00:00:02,280 --> 00:00:03,930 ‫with Kinesis Data Streams. 3 00:00:03,930 --> 00:00:05,410 ‫So the consumers get records 4 00:00:05,410 --> 00:00:07,820 ‫from the stream and they process them. 5 00:00:07,820 --> 00:00:10,080 ‫There is the consumers can be Lambda functions 6 00:00:10,080 --> 00:00:12,900 ‫they could be Kinesis Data Analytics can be a Firehose 7 00:00:12,900 --> 00:00:16,680 ‫a custom consumer using the SDK in two different modes 8 00:00:16,680 --> 00:00:21,170 ‫classic or enhanced fan-outs and the Kinesis Client Library 9 00:00:21,170 --> 00:00:22,760 ‫which is a library used to simplify reading 10 00:00:22,760 --> 00:00:23,950 ‫from a data stream. 11 00:00:23,950 --> 00:00:25,570 ‫I have a dedicated lecture by the way 12 00:00:25,570 --> 00:00:27,900 ‫on the Kinesis Client Library. 13 00:00:27,900 --> 00:00:32,200 ‫So let's discuss the difference between a classic shared 14 00:00:32,200 --> 00:00:35,890 ‫fan-out consumer and an enhanced consumer. 15 00:00:35,890 --> 00:00:37,550 ‫So when you write your consumer 16 00:00:37,550 --> 00:00:40,400 ‫in the classic shared throughput way 17 00:00:40,400 --> 00:00:41,730 ‫you have your Kinesis Data Stream 18 00:00:41,730 --> 00:00:45,210 ‫with a lot of shards and you get two megabytes per second 19 00:00:45,210 --> 00:00:47,780 ‫per shard across all consumers. 20 00:00:47,780 --> 00:00:50,250 ‫That means that if you look at shard, number one 21 00:00:50,250 --> 00:00:51,770 ‫just for this example 22 00:00:51,770 --> 00:00:54,030 ‫we can write a Consumer Application A that is 23 00:00:54,030 --> 00:00:57,580 ‫going to issue a get records API call to get records 24 00:00:57,580 --> 00:00:59,890 ‫from the shard one, but as possible 25 00:00:59,890 --> 00:01:01,670 ‫for us to have many different applications reading 26 00:01:01,670 --> 00:01:03,780 ‫from the same Kinesis Data Stream. 27 00:01:03,780 --> 00:01:07,380 ‫So consumer Application B will also issue get records 28 00:01:07,380 --> 00:01:11,350 ‫API calls, and the Consumer Application C will also 29 00:01:11,350 --> 00:01:14,440 ‫issue get records across all consumers. 30 00:01:14,440 --> 00:01:16,830 ‫Now, what happens in this instance is 31 00:01:16,830 --> 00:01:20,410 ‫that they are all sharing two megabytes per second 32 00:01:20,410 --> 00:01:22,670 ‫per shard across all consumers. 33 00:01:22,670 --> 00:01:24,320 ‫That means that in this instance 34 00:01:24,320 --> 00:01:27,350 ‫we have three consumers sharing, two megabytes per second. 35 00:01:27,350 --> 00:01:30,150 ‫That means that each consumer can get a maximum 36 00:01:30,150 --> 00:01:35,090 ‫of approximately 666 kilobytes per second of data. 37 00:01:35,090 --> 00:01:36,120 ‫So as you can see, there's a limit 38 00:01:36,120 --> 00:01:37,360 ‫to how many consumers we can have 39 00:01:37,360 --> 00:01:39,370 ‫in the more consumer applications we add 40 00:01:39,370 --> 00:01:40,880 ‫onto Kinesis Data Stream. 41 00:01:40,880 --> 00:01:44,390 ‫The more we're put limitation we're going to have. 42 00:01:44,390 --> 00:01:45,660 ‫So, which bring us 43 00:01:45,660 --> 00:01:49,580 ‫to a new mode of consumption brought by AWS recently 44 00:01:49,580 --> 00:01:52,410 ‫which is called the Enhanced Fan-Out Consumer. 45 00:01:52,410 --> 00:01:55,610 ‫So in this case we get two megabytes per second, 46 00:01:55,610 --> 00:01:57,940 ‫per consumer, per shard. 47 00:01:57,940 --> 00:01:59,410 ‫So it's not across all consumers 48 00:01:59,410 --> 00:02:01,480 ‫it's per consumer, per chart. 49 00:02:01,480 --> 00:02:03,710 ‫So that means that the Consumer Application A 50 00:02:03,710 --> 00:02:07,320 ‫will use a new API code called Subscribe to Shard 51 00:02:07,320 --> 00:02:10,780 ‫and this will make the shard, send the data 52 00:02:10,780 --> 00:02:14,400 ‫push the data into our Consumer Application A 53 00:02:14,400 --> 00:02:17,130 ‫at the rate of two megabytes per second. 54 00:02:17,130 --> 00:02:20,090 ‫And if the second Consumer Application B issues 55 00:02:20,090 --> 00:02:22,170 ‫another subscribed to shard 56 00:02:22,170 --> 00:02:24,350 ‫then this Consumer Application B as well. 57 00:02:24,350 --> 00:02:26,400 ‫Will get the data pushed by the shard 58 00:02:26,400 --> 00:02:28,190 ‫into the Consumer Application 59 00:02:28,190 --> 00:02:30,340 ‫at the rate of two megabytes per second. 60 00:02:30,340 --> 00:02:32,280 ‫And so will Consumer C. 61 00:02:32,280 --> 00:02:33,550 ‫So as we can see here 62 00:02:33,550 --> 00:02:36,880 ‫we have three consumer applications and we get six megabytes 63 00:02:36,880 --> 00:02:39,360 ‫per second of throughputs for this one chart. 64 00:02:39,360 --> 00:02:42,000 ‫So in the first model we have a pull model 65 00:02:42,000 --> 00:02:44,580 ‫and the second model, we have a push model. 66 00:02:44,580 --> 00:02:46,290 ‫So lets summarize, pull model 67 00:02:46,290 --> 00:02:48,810 ‫for the Shared Classic Fan-out Consumer, 68 00:02:48,810 --> 00:02:50,460 ‫which it gives you a low, which is good. 69 00:02:50,460 --> 00:02:52,780 ‫When you have a low number of consuming applications. 70 00:02:52,780 --> 00:02:54,750 ‫And the Ruth reboot is two megabytes per second 71 00:02:54,750 --> 00:02:56,800 ‫per shard across all consumers. 72 00:02:56,800 --> 00:02:59,070 ‫There's also limits is that per shard 73 00:02:59,070 --> 00:03:03,590 ‫you get maximum five, get records, API calls per second. 74 00:03:03,590 --> 00:03:06,720 ‫The latency of the API calls around 200 milliseconds. 75 00:03:06,720 --> 00:03:08,150 ‫And this is something you want to use when 76 00:03:08,150 --> 00:03:10,120 ‫you want to minimize costs. 77 00:03:10,120 --> 00:03:12,630 ‫The consumers will pull from Kinesis directly 78 00:03:12,630 --> 00:03:15,470 ‫using the get records API call and the return data 79 00:03:15,470 --> 00:03:17,120 ‫can be up to 10 megabytes. 80 00:03:17,120 --> 00:03:17,953 ‫Then it will float off 81 00:03:17,953 --> 00:03:20,723 ‫for five seconds or up to 10,000 records. 82 00:03:21,570 --> 00:03:23,500 ‫So if we use Enhance Fan-out Consumer 83 00:03:23,500 --> 00:03:24,960 ‫which is a push method 84 00:03:24,960 --> 00:03:27,120 ‫we can get some multiple consuming applications 85 00:03:27,120 --> 00:03:30,230 ‫from the same stream and each consumer will get 86 00:03:30,230 --> 00:03:32,370 ‫two megabytes per second, per shard. 87 00:03:32,370 --> 00:03:34,284 ‫The latency is going to be much lower because 88 00:03:34,284 --> 00:03:37,510 ‫the Shard itself will push the data into our consumer. 89 00:03:37,510 --> 00:03:40,410 ‫So 70 millisecond, the cost is higher. 90 00:03:40,410 --> 00:03:43,430 ‫It's a feature that will cost you more on AWS. 91 00:03:43,430 --> 00:03:44,800 ‫And the data, as I said 92 00:03:44,800 --> 00:03:49,150 ‫is pushed using a streaming method called HDB two. 93 00:03:49,150 --> 00:03:50,740 ‫Finally, there's a soft limit 94 00:03:50,740 --> 00:03:53,670 ‫of five consumer applications per data stream 95 00:03:53,670 --> 00:03:57,663 ‫but you can increase this by putting in a ticket on AWS. 96 00:03:58,780 --> 00:04:00,890 ‫Finally, we haven't seen Lambda in depth yet 97 00:04:00,890 --> 00:04:01,810 ‫but it is a way 98 00:04:01,810 --> 00:04:04,650 ‫for you to consume data without using servers. 99 00:04:04,650 --> 00:04:08,450 ‫So we have Kinesis Data Stream and say has three shards. 100 00:04:08,450 --> 00:04:10,070 ‫And so we're each going to learn the functions 101 00:04:10,070 --> 00:04:11,980 ‫and their role will be to process records 102 00:04:11,980 --> 00:04:14,070 ‫and save the record into dynamodb 103 00:04:14,070 --> 00:04:15,830 ‫which is a serverless database. 104 00:04:15,830 --> 00:04:17,380 ‫So the Lambda functions are going to call 105 00:04:17,380 --> 00:04:19,940 ‫get batch onto the Kinesis Data Stream. 106 00:04:19,940 --> 00:04:21,950 ‫And the data is going to be sent to Ireland 107 00:04:21,950 --> 00:04:25,300 ‫the functions by partition key to be processed. 108 00:04:25,300 --> 00:04:28,220 ‫The Lambda functions can then send data into dynamodb 109 00:04:28,220 --> 00:04:31,030 ‫and we have a way to process our Kinesis Data Stream 110 00:04:31,030 --> 00:04:33,730 ‫using a serverless mechanism. 111 00:04:33,730 --> 00:04:36,570 ‫So in this example, Lambda functions support 112 00:04:36,570 --> 00:04:39,270 ‫both the classic and enhance fan-out consumer modes. 113 00:04:39,270 --> 00:04:41,490 ‫So you can say how you want to consume data 114 00:04:41,490 --> 00:04:42,420 ‫from Kinesis Data Streams. 115 00:04:42,420 --> 00:04:44,310 ‫It will read the record in batches. 116 00:04:44,310 --> 00:04:47,040 ‫You can configure the batch size and the batch window. 117 00:04:47,040 --> 00:04:48,630 ‫And in case there's an error that occurs 118 00:04:48,630 --> 00:04:51,020 ‫the Lambda will retry until the succeeds 119 00:04:51,020 --> 00:04:54,740 ‫or the data will be expired in Kinesis Data Stream. 120 00:04:54,740 --> 00:04:55,650 ‫You can also process 121 00:04:55,650 --> 00:04:58,640 ‫up to 10 batches per shard simultaneously. 122 00:04:58,640 --> 00:05:00,610 ‫So we'll have a look at Kinesis at Lambda 123 00:05:00,610 --> 00:05:02,310 ‫in greater detail in the Lambda function? 124 00:05:02,310 --> 00:05:03,143 ‫So don't worry too much 125 00:05:03,143 --> 00:05:06,000 ‫if you don't understand what I just said, but so that sets 126 00:05:06,000 --> 00:05:08,970 ‫we've seen consumers in Kinesis Data Streams. 127 00:05:08,970 --> 00:05:11,410 ‫Now let's go ahead and do a hands-on to really 128 00:05:11,410 --> 00:05:13,330 ‫get some practice with Kinesis Data Streams. 129 00:05:13,330 --> 00:05:15,230 ‫So I will see you in the next lecture.