1 00:00:00,210 --> 00:00:02,070 ‫Now let's learn about a new service 2 00:00:02,070 --> 00:00:04,160 ‫which is Kinesis Data Firehose. 3 00:00:04,160 --> 00:00:06,560 ‫So it is a very helpful service 4 00:00:06,560 --> 00:00:09,380 ‫that can take data from producers 5 00:00:09,380 --> 00:00:11,900 ‫and producers can be everything we've seen 6 00:00:11,900 --> 00:00:15,680 ‫for Kinesis Data Streams or applications, clients SDK, KPL 7 00:00:15,680 --> 00:00:17,660 ‫all the cases agents can all produce 8 00:00:17,660 --> 00:00:19,590 ‫into your Kinesis Data Firehose 9 00:00:19,590 --> 00:00:22,340 ‫But also a Kinesis Data Stream can produce 10 00:00:22,340 --> 00:00:26,010 ‫into a Kinesis Firehose, Amazon CloudWatch logs and events 11 00:00:26,010 --> 00:00:29,870 ‫can produce into Kinesis Data Firehose or IOT 12 00:00:29,870 --> 00:00:32,260 ‫and all these applications are going to send records 13 00:00:32,260 --> 00:00:34,250 ‫into Kinesis Data Firehose 14 00:00:34,250 --> 00:00:36,870 ‫and then Kinesis Data Firehose can optionally choose 15 00:00:36,870 --> 00:00:39,590 ‫to transform the data using a Lambda function 16 00:00:39,590 --> 00:00:40,990 ‫but this is optional 17 00:00:40,990 --> 00:00:43,500 ‫and once the data is transformed optionally, 18 00:00:43,500 --> 00:00:47,430 ‫then it can be written in batches into destinations. 19 00:00:47,430 --> 00:00:50,360 ‫So Kinesis Data Firehose takes data from sources. 20 00:00:50,360 --> 00:00:53,530 ‫Usually the most common is going to be Kinesis Data Streams 21 00:00:53,530 --> 00:00:56,830 ‫and it's going to write this data into destinations 22 00:00:56,830 --> 00:00:58,500 ‫without you writing any kind of code 23 00:00:58,500 --> 00:01:02,550 ‫because Kinesis Data Firehose knows how to write data. 24 00:01:02,550 --> 00:01:04,410 ‫So there're three kinds of destinations 25 00:01:04,410 --> 00:01:05,990 ‫with Kinesis Data Firehose. 26 00:01:05,990 --> 00:01:09,210 ‫The number one category is AWS destinations 27 00:01:09,210 --> 00:01:11,370 ‫and you need to know them by heart. 28 00:01:11,370 --> 00:01:13,060 ‫So the first one is Amazon S3. 29 00:01:13,060 --> 00:01:16,470 ‫So you can write all your data into Amazon S3. 30 00:01:16,470 --> 00:01:18,580 ‫The second one is Amazon Redshift 31 00:01:18,580 --> 00:01:20,520 ‫which is a warehousing database 32 00:01:20,520 --> 00:01:24,920 ‫and to do so it first writes the data into Amazon history 33 00:01:24,920 --> 00:01:27,860 ‫and then Kinesis Data Firehose will issue a copy command 34 00:01:27,860 --> 00:01:30,410 ‫and this copy command is going to copy data 35 00:01:30,410 --> 00:01:33,970 ‫from Amazon history into Amazon Redshift 36 00:01:33,970 --> 00:01:36,210 ‫and the last destination on AWS 37 00:01:36,210 --> 00:01:38,053 ‫is called Amazon OpenSearch. 38 00:01:39,130 --> 00:01:42,260 ‫There're also some third party partner destinations. 39 00:01:42,260 --> 00:01:44,310 ‫So Kinesis Data Firehose can send data 40 00:01:44,310 --> 00:01:48,000 ‫into Datadog, Splunk, New Relic, MongoDB 41 00:01:48,000 --> 00:01:50,780 ‫and this list can get bigger and bigger over time. 42 00:01:50,780 --> 00:01:53,540 ‫So it will not update this if there're new partners 43 00:01:53,540 --> 00:01:55,260 ‫but just so you know, there're partners 44 00:01:55,260 --> 00:01:57,750 ‫that Kinesis Data Firehose can send data to 45 00:01:57,750 --> 00:02:02,400 ‫or finally, if you have your own API with an HTTP endpoint 46 00:02:02,400 --> 00:02:05,280 ‫it is for you to send data from Kinesis Data Firehose 47 00:02:05,280 --> 00:02:07,713 ‫into a custom destination. 48 00:02:08,860 --> 00:02:10,570 ‫Okay, so once the data is sent 49 00:02:10,570 --> 00:02:12,750 ‫into all these destinations, you have two options. 50 00:02:12,750 --> 00:02:17,750 ‫You can also send all the data into an S3 bucket as a backup 51 00:02:18,470 --> 00:02:21,030 ‫or just send the data that was failed to be written 52 00:02:21,030 --> 00:02:25,500 ‫into these destinations into a failed S3 buckets. 53 00:02:25,500 --> 00:02:28,340 ‫So to summarize, Kinesis Data Firehose 54 00:02:28,340 --> 00:02:30,010 ‫is a fully managed service. 55 00:02:30,010 --> 00:02:32,780 ‫So there's no administration, automated scaling 56 00:02:32,780 --> 00:02:33,640 ‫and it is serverless. 57 00:02:33,640 --> 00:02:35,090 ‫So no servers to manage. 58 00:02:35,090 --> 00:02:39,080 ‫You can send data into AWS destination such as RedShift, 59 00:02:39,080 --> 00:02:41,770 ‫Amazon S3 and OpenSearch. 60 00:02:41,770 --> 00:02:42,900 ‫Third-party partners such 61 00:02:42,900 --> 00:02:45,700 ‫as Splunk, MongoDB, Datadog, New Relic, etc 62 00:02:46,720 --> 00:02:51,010 ‫and custom destinations to any HTTP endpoints. 63 00:02:51,010 --> 00:02:52,140 ‫You're going to pay only 64 00:02:52,140 --> 00:02:54,250 ‫for the data going through Firehose. 65 00:02:54,250 --> 00:02:57,110 ‫So this is a very good data pressing model 66 00:02:57,110 --> 00:02:59,410 ‫and it is a near real time, why? 67 00:02:59,410 --> 00:03:01,990 ‫Well, because we write data in batches 68 00:03:01,990 --> 00:03:04,100 ‫from Firehose to the destination. 69 00:03:04,100 --> 00:03:07,330 ‫So there's going to be a 60 seconds latency minimum 70 00:03:07,330 --> 00:03:08,810 ‫for non full batches, 71 00:03:08,810 --> 00:03:12,210 ‫or you need to wait until you have at least 1 megabyte 72 00:03:12,210 --> 00:03:16,060 ‫of data at a time to send the data into the destination 73 00:03:16,060 --> 00:03:18,890 ‫which makes it a near real time service 74 00:03:18,890 --> 00:03:20,520 ‫and not a real-time service. 75 00:03:20,520 --> 00:03:23,560 ‫It supports many data formats, conversions, transformation 76 00:03:23,560 --> 00:03:25,460 ‫and compressions and you can write 77 00:03:25,460 --> 00:03:28,580 ‫your own data transformation using Lambda if you needed to. 78 00:03:28,580 --> 00:03:31,280 ‫Finally, you can send all the failed 79 00:03:31,280 --> 00:03:34,640 ‫or all the data into your backup S3 buckets. 80 00:03:34,640 --> 00:03:37,410 ‫So a question that comes up at the exam usually 81 00:03:37,410 --> 00:03:38,880 ‫is you understand the difference 82 00:03:38,880 --> 00:03:42,730 ‫of when to use Kinesis Data Streams and Kinesis Firehose. 83 00:03:42,730 --> 00:03:44,430 ‫So should be very easy for you now 84 00:03:44,430 --> 00:03:46,770 ‫if you followed closely but let's summarize. 85 00:03:46,770 --> 00:03:49,290 ‫Kinesis Data Streams is just a streaming service used 86 00:03:49,290 --> 00:03:51,360 ‫to ingest data at scale 87 00:03:51,360 --> 00:03:52,600 ‫and you write your own custom code 88 00:03:52,600 --> 00:03:54,330 ‫for your producers and your consumers. 89 00:03:54,330 --> 00:03:55,163 ‫It's real time. 90 00:03:55,163 --> 00:03:57,810 ‫So 200 millisecond or 70 millisecond 91 00:03:57,810 --> 00:03:59,120 ‫and you manage scaling yourself. 92 00:03:59,120 --> 00:04:00,840 ‫You do shard splitting and shard merging 93 00:04:00,840 --> 00:04:03,000 ‫to increase the scale and throughputs. 94 00:04:03,000 --> 00:04:04,220 ‫You're going to also pay 95 00:04:04,220 --> 00:04:06,840 ‫for how much capacity you have provision. 96 00:04:06,840 --> 00:04:08,810 ‫The data storage in the Kinesis Data Stream 97 00:04:08,810 --> 00:04:12,070 ‫can be between 1 to 365 days. 98 00:04:12,070 --> 00:04:13,610 ‫This allows multiple consumers to read 99 00:04:13,610 --> 00:04:17,740 ‫from the same stream and also supports replay capability 100 00:04:17,740 --> 00:04:20,840 ‫whereas Kinesis Data Firehose is an ingestion service 101 00:04:20,840 --> 00:04:24,150 ‫to stream data into S3, Redshift, OpenSearch 102 00:04:24,150 --> 00:04:26,310 ‫or third party or a custom HTTP. 103 00:04:26,310 --> 00:04:28,250 ‫It is fully managed, no service to manage. 104 00:04:28,250 --> 00:04:29,190 ‫It is near real time. 105 00:04:29,190 --> 00:04:31,130 ‫So remember this, near real-time 106 00:04:31,130 --> 00:04:34,950 ‫is a keyword you need to look at in your exam questions. 107 00:04:34,950 --> 00:04:36,030 ‫There's automated scaling. 108 00:04:36,030 --> 00:04:37,710 ‫So need for you to worry about it 109 00:04:37,710 --> 00:04:38,710 ‫and you're going to pay only 110 00:04:38,710 --> 00:04:41,290 ‫for what goes through Kinesis Data Firehose. 111 00:04:41,290 --> 00:04:44,030 ‫There is no data source so you cannot replay data 112 00:04:44,030 --> 00:04:45,670 ‫from Kinesis Data Firehose. 113 00:04:45,670 --> 00:04:48,180 ‫So yeah, it doesn't support replay capability. 114 00:04:48,180 --> 00:04:50,800 ‫So that's it's for the overview 115 00:04:50,800 --> 00:04:52,750 ‫of Kinesis Data Firehose. 116 00:04:52,750 --> 00:04:54,130 ‫I hope that makes sense 117 00:04:54,130 --> 00:04:56,080 ‫and I will see you in the next lecture.