1 00:00:00,450 --> 00:00:03,090 ‫Now let's discuss Kinesis Client Library. 2 00:00:03,090 --> 00:00:05,140 ‫So it is something that the exam can ask you 3 00:00:05,140 --> 00:00:06,410 ‫one scenario question about, 4 00:00:06,410 --> 00:00:09,010 ‫so let's go over how it works in the scenario question. 5 00:00:09,010 --> 00:00:12,070 ‫So this is a Java library that helps you read records 6 00:00:12,070 --> 00:00:15,100 ‫from Kinesis Data Stream with distributed applications 7 00:00:15,100 --> 00:00:17,830 ‫that we'll be sharing the read workload. 8 00:00:17,830 --> 00:00:21,280 ‫And each shard is to be read by only KCL instance, 9 00:00:21,280 --> 00:00:22,970 ‫that means that if you have 4 shards, 10 00:00:22,970 --> 00:00:26,260 ‫you get a maximum of 4 KCL instances. 11 00:00:26,260 --> 00:00:29,820 ‫If you have 6 shards, you get a maximum of 6 KCL instances. 12 00:00:29,820 --> 00:00:32,090 ‫And if I just say, you're good to go for the exam, 13 00:00:32,090 --> 00:00:33,980 ‫but I want you to explain exactly how it works, 14 00:00:33,980 --> 00:00:35,480 ‫so you can get an idea about 15 00:00:35,480 --> 00:00:37,140 ‫how the Kinesis Client Library works. 16 00:00:37,140 --> 00:00:39,810 ‫So, the Kinesis Client Library will be reading 17 00:00:39,810 --> 00:00:41,870 ‫from our Kinesis Data Stream 18 00:00:41,870 --> 00:00:44,280 ‫and the progress of how far it's been reading 19 00:00:44,280 --> 00:00:46,820 ‫is going to be checkpointed into DynamoDB, 20 00:00:46,820 --> 00:00:49,400 ‫and so your application running KCL 21 00:00:49,400 --> 00:00:51,960 ‫will need IAM access to DynamoDB. 22 00:00:51,960 --> 00:00:53,680 ‫It will be able, thanks to DynamoDB 23 00:00:53,680 --> 00:00:56,670 ‫to track the other workers of your KCL application 24 00:00:56,670 --> 00:00:59,180 ‫and share the work among shards. 25 00:00:59,180 --> 00:01:01,100 ‫KCL can run on anything you want 26 00:01:01,100 --> 00:01:03,480 ‫but you can be running on EC2 instances, 27 00:01:03,480 --> 00:01:05,100 ‫with an EC2 instance role, 28 00:01:05,100 --> 00:01:07,050 ‫you're Elastic Beanstalk application, 29 00:01:07,050 --> 00:01:08,790 ‫or on-premises servers, 30 00:01:08,790 --> 00:01:11,680 ‫as long as they have correct IAM credentials. 31 00:01:11,680 --> 00:01:13,000 ‫The records are going to be read in order 32 00:01:13,000 --> 00:01:15,260 ‫and at the shard level obviously, 33 00:01:15,260 --> 00:01:18,020 ‫and there are two versions of the Kinesis Client Library, 34 00:01:18,020 --> 00:01:20,650 ‫Version 1, supports only shared consumer 35 00:01:20,650 --> 00:01:22,640 ‫and version two of KCL, 36 00:01:22,640 --> 00:01:26,370 ‫supports both shared and enhance fan-out consumer remotes. 37 00:01:26,370 --> 00:01:31,060 ‫So, if we look at an example of 4 shards into our stream, 38 00:01:31,060 --> 00:01:34,520 ‫we can have a DynamoDB table to check on the progress, 39 00:01:34,520 --> 00:01:36,840 ‫and so we can run two KCL apps 40 00:01:36,840 --> 00:01:39,260 ‫of the same coherent application 41 00:01:39,260 --> 00:01:42,630 ‫running on two different EC2 instances. 42 00:01:42,630 --> 00:01:44,980 ‫in this case, thanks to DynamoDB, 43 00:01:44,980 --> 00:01:46,710 ‫they will know how to share the work, 44 00:01:46,710 --> 00:01:48,970 ‫so the first KCl app is going to be reading 45 00:01:48,970 --> 00:01:49,987 ‫from shard 1 and 2, 46 00:01:49,987 --> 00:01:52,040 ‫and the second KCL app is going to be reading 47 00:01:52,040 --> 00:01:53,840 ‫from shard 3 and 4. 48 00:01:53,840 --> 00:01:55,700 ‫Now, the progress of how far 49 00:01:55,700 --> 00:01:58,170 ‫the app has been reading into the Kinesis Data Stream 50 00:01:58,170 --> 00:02:00,550 ‫will be checkpointed into DynamoDB. 51 00:02:00,550 --> 00:02:03,590 ‫And so, for example, if one of these application goes down, 52 00:02:03,590 --> 00:02:06,870 ‫DynamoDB and KCL apps working together, 53 00:02:06,870 --> 00:02:08,380 ‫will know that an app will go down, 54 00:02:08,380 --> 00:02:10,950 ‫and so reading from the other shards will be resumed 55 00:02:10,950 --> 00:02:12,623 ‫from where it was checkpointed. 56 00:02:13,620 --> 00:02:15,090 ‫It works also when you scale up, 57 00:02:15,090 --> 00:02:16,400 ‫so if you have 4 shards 58 00:02:16,400 --> 00:02:19,250 ‫and now you run 4 KCL applications, 59 00:02:19,250 --> 00:02:21,910 ‫then it will be each reading from one shard. 60 00:02:21,910 --> 00:02:24,427 ‫And therefore the progress will be resumed from DynamoDB 61 00:02:24,427 --> 00:02:25,610 ‫and checkpointed again. 62 00:02:25,610 --> 00:02:27,310 ‫So as you can see how this works, right? 63 00:02:27,310 --> 00:02:29,800 ‫But we can not have more KCL apps than shards, 64 00:02:29,800 --> 00:02:32,980 ‫because well, otherwise one will be doing nothing. 65 00:02:32,980 --> 00:02:35,070 ‫So if you want to read to scale Kinesis, 66 00:02:35,070 --> 00:02:37,800 ‫you can scale Kinesis and add 6 shards, 67 00:02:37,800 --> 00:02:39,930 ‫so now we still have our 4 KCL applications, 68 00:02:39,930 --> 00:02:42,830 ‫but now we have six shards in Kinesis in the streams. 69 00:02:42,830 --> 00:02:45,450 ‫And so again, they will detect this change, 70 00:02:45,450 --> 00:02:48,190 ‫and working together with DynamoDB, 71 00:02:48,190 --> 00:02:50,010 ‫they will again, split the work 72 00:02:50,010 --> 00:02:54,430 ‫between each KCL application and the shard assignments. 73 00:02:54,430 --> 00:02:56,420 ‫So that means that once we have 6 shards 74 00:02:56,420 --> 00:02:57,570 ‫Kinesis Data Stream 75 00:02:57,570 --> 00:03:01,580 ‫then we can have 6 KCL applications reading from them, 76 00:03:01,580 --> 00:03:04,040 ‫and checkpointing the progress into DynamoDB. 77 00:03:04,040 --> 00:03:05,000 ‫If you've understood that, 78 00:03:05,000 --> 00:03:06,880 ‫then you will be good to go for the exam 79 00:03:06,880 --> 00:03:08,400 ‫to answer the question. 80 00:03:08,400 --> 00:03:10,560 ‫That's it for this lecture, I hope you liked it, 81 00:03:10,560 --> 00:03:12,510 ‫and I will see you in the next lecture.