1 00:00:00,480 --> 00:00:02,730 Another analytics service you will see 2 00:00:02,730 --> 00:00:06,120 is Amazon Managed Streaming for Apache Kafka, 3 00:00:06,120 --> 00:00:08,280 also called Amazon MSK. 4 00:00:08,280 --> 00:00:09,420 And what is Kafka? 5 00:00:09,420 --> 00:00:12,960 Well, Kafka is an alternative to Amazon Kinesis. 6 00:00:12,960 --> 00:00:16,140 Kafka and Kinesis both allow you to stream data. 7 00:00:16,140 --> 00:00:18,840 So, MSK is the ability to get 8 00:00:18,840 --> 00:00:22,320 a fully-managed Kafka cluster on AWS. 9 00:00:22,320 --> 00:00:24,240 And it allows you to create, update 10 00:00:24,240 --> 00:00:25,860 and delete clusters on the fly. 11 00:00:25,860 --> 00:00:28,590 And MSK is going to create and manage Kafka broker nodes 12 00:00:28,590 --> 00:00:31,230 and Zookeeper broker nodes in your cluster for you 13 00:00:31,230 --> 00:00:35,700 and you deploy the cluster in your VPC, across multiple AZ, 14 00:00:35,700 --> 00:00:38,040 up to three for high availability. 15 00:00:38,040 --> 00:00:41,640 You also have automatic recovery from common Kafka failures 16 00:00:41,640 --> 00:00:43,650 and the data is stored on EBS volumes 17 00:00:43,650 --> 00:00:45,150 for as long as you want. 18 00:00:45,150 --> 00:00:46,890 So, from personal experience, 19 00:00:46,890 --> 00:00:49,410 I know it's very difficult to set up Apache Kafka 20 00:00:49,410 --> 00:00:51,480 and the fact you can just do one click 21 00:00:51,480 --> 00:00:54,510 and then deploy Kafka on AWS is great 22 00:00:54,510 --> 00:00:57,150 and this is the Amazon MSK service. 23 00:00:57,150 --> 00:01:01,650 So on top of it, you have the option to use MSK Serverless. 24 00:01:01,650 --> 00:01:04,410 And this is that you run Apache Kafka on MSK, 25 00:01:04,410 --> 00:01:06,270 but this time you don't provision servers, 26 00:01:06,270 --> 00:01:07,890 you don't manage capacity, 27 00:01:07,890 --> 00:01:10,200 automatically MSK will provision resources 28 00:01:10,200 --> 00:01:12,840 and scale, compute and storage for you. 29 00:01:12,840 --> 00:01:14,610 So what is Apache Kafka then? 30 00:01:14,610 --> 00:01:17,790 Apache Kafka is a way for you to stream data 31 00:01:17,790 --> 00:01:21,420 and a Kafka cluster is made of multiple brokers 32 00:01:21,420 --> 00:01:24,027 and then you will have producers that will produce data 33 00:01:24,027 --> 00:01:26,370 and so they will have to ingest data from places, 34 00:01:26,370 --> 00:01:29,160 such as Kinesis, IoT RDS, et cetera, et cetera, 35 00:01:29,160 --> 00:01:30,930 and they will send the data directly 36 00:01:30,930 --> 00:01:34,230 into a Kafka topic that is going to be fully replicated 37 00:01:34,230 --> 00:01:36,120 into other brokers. 38 00:01:36,120 --> 00:01:38,730 Now, this Kafka topic is having real-time streaming 39 00:01:38,730 --> 00:01:42,690 of data and consumers will pull from the topic 40 00:01:42,690 --> 00:01:44,460 to consume the data itself 41 00:01:44,460 --> 00:01:47,010 and then your consumer can do whatever he wants, 42 00:01:47,010 --> 00:01:49,830 process it or send it to various destinations, 43 00:01:49,830 --> 00:01:53,400 such as EMR, S3, SageMaker, Kinesis and RDS. 44 00:01:53,400 --> 00:01:57,000 So the idea is that Kafka is quite similar to Kinesis, 45 00:01:57,000 --> 00:01:59,760 but there are differences to look out for. 46 00:01:59,760 --> 00:02:02,040 So what are the differences between Kinesis Data Streams 47 00:02:02,040 --> 00:02:04,140 and Amazon MSK? 48 00:02:04,140 --> 00:02:05,640 Well, in Kinesis Data Streams, 49 00:02:05,640 --> 00:02:08,009 you have one megabyte message limit, 50 00:02:08,009 --> 00:02:10,680 which is the default in Amazon MSK, 51 00:02:10,680 --> 00:02:13,200 but you can configure it for a higher message retention. 52 00:02:13,200 --> 00:02:15,060 For example, 10 megabytes. 53 00:02:15,060 --> 00:02:17,760 You can have Data Streams with Shards 54 00:02:17,760 --> 00:02:21,030 in Kinesis Data Streams or in MSK, 55 00:02:21,030 --> 00:02:23,490 it's called Kafka Topics with Partitions, 56 00:02:23,490 --> 00:02:25,980 but the concept are sort of similar. 57 00:02:25,980 --> 00:02:28,440 To scale Kinesis Data Stream, 58 00:02:28,440 --> 00:02:32,670 you need to do Shard Splitting and to scale it down Merging. 59 00:02:32,670 --> 00:02:34,830 But in Amazon MSK to scale a topic, 60 00:02:34,830 --> 00:02:36,420 you can only add partitions. 61 00:02:36,420 --> 00:02:38,160 You cannot remove partitions. 62 00:02:38,160 --> 00:02:41,400 You have in-flight encryption for Kinesis data streams 63 00:02:41,400 --> 00:02:43,320 and then you have either plain text 64 00:02:43,320 --> 00:02:46,230 or TLS in-flight encryption for MSK. 65 00:02:46,230 --> 00:02:49,230 You get at-risk encryption for both of these clusters 66 00:02:49,230 --> 00:02:52,230 and, in the exam level, this is enough. 67 00:02:52,230 --> 00:02:54,330 Just so you know, a few differences. 68 00:02:54,330 --> 00:02:58,500 And also for Amazon MSK, you can keep data 69 00:02:58,500 --> 00:03:00,870 for as long as you want, you can go over one year, 70 00:03:00,870 --> 00:03:03,450 as long as you pay for the underlying EBS storage, 71 00:03:03,450 --> 00:03:04,350 you're good to go. 72 00:03:05,220 --> 00:03:09,120 So to produce to MSK you need to create a Kafka Producer 73 00:03:09,120 --> 00:03:12,180 and then to consume from MSK, you have multiple options. 74 00:03:12,180 --> 00:03:14,940 The first one is to use Kinesis Data Analytics 75 00:03:14,940 --> 00:03:16,260 for Apache Flink. 76 00:03:16,260 --> 00:03:17,610 So you want a Flink Application 77 00:03:17,610 --> 00:03:20,370 and you make it read it directly from the MSK cluster. 78 00:03:20,370 --> 00:03:23,310 You can use Glue as well to do streaming ETL jobs 79 00:03:23,310 --> 00:03:26,610 and they're powered by, the time, Apache Spark Streaming. 80 00:03:26,610 --> 00:03:30,180 You can use Lambda functions to directly have Amazon MSK 81 00:03:30,180 --> 00:03:33,930 as an event source or you can write your own Kafka consumer 82 00:03:33,930 --> 00:03:36,360 and you can make it run on whatever platform you want, 83 00:03:36,360 --> 00:03:39,060 for example, your Amazon EC2 instances, 84 00:03:39,060 --> 00:03:42,420 or an ECS cluster or an EKS cluster. 85 00:03:42,420 --> 00:03:43,350 And once you know this, 86 00:03:43,350 --> 00:03:44,880 you know pretty much everything there is to know 87 00:03:44,880 --> 00:03:46,650 for Amazon MSK at the exam. 88 00:03:46,650 --> 00:03:47,910 So I hope you liked it 89 00:03:47,910 --> 00:03:49,860 and I will see you in the next lecture.