1 00:00:00,210 --> 00:00:02,070 Now let's learn about a new service 2 00:00:02,070 --> 00:00:04,160 which is Kinesis Data Firehose. 3 00:00:04,160 --> 00:00:06,560 So it is a very helpful service 4 00:00:06,560 --> 00:00:09,380 that can take data from producers 5 00:00:09,380 --> 00:00:11,900 and producers can be everything we've seen 6 00:00:11,900 --> 00:00:15,680 for Kinesis Data Streams or applications, clients SDK, KPL 7 00:00:15,680 --> 00:00:17,660 all the cases agents can all produce 8 00:00:17,660 --> 00:00:19,590 into your Kinesis Data Firehose 9 00:00:19,590 --> 00:00:22,340 But also a Kinesis Data Stream can produce 10 00:00:22,340 --> 00:00:26,010 into a Kinesis Firehose, Amazon CloudWatch logs and events 11 00:00:26,010 --> 00:00:29,870 can produce into Kinesis Data Firehose or IOT 12 00:00:29,870 --> 00:00:32,259 and all these applications are going to send records 13 00:00:32,259 --> 00:00:34,250 into Kinesis Data Firehose 14 00:00:34,250 --> 00:00:36,870 and then Kinesis Data Firehose can optionally choose 15 00:00:36,870 --> 00:00:39,590 to transform the data using a Lambda function 16 00:00:39,590 --> 00:00:40,990 but this is optional 17 00:00:40,990 --> 00:00:43,500 and once the data is transformed optionally, 18 00:00:43,500 --> 00:00:47,430 then it can be written in batches into destinations. 19 00:00:47,430 --> 00:00:50,360 So Kinesis Data Firehose takes data from sources. 20 00:00:50,360 --> 00:00:53,530 Usually the most common is going to be Kinesis Data Streams 21 00:00:53,530 --> 00:00:56,830 and it's going to write this data into destinations 22 00:00:56,830 --> 00:00:58,500 without you writing any kind of code 23 00:00:58,500 --> 00:01:02,550 because Kinesis Data Firehose knows how to write data. 24 00:01:02,550 --> 00:01:04,410 So there're three kinds of destinations 25 00:01:04,410 --> 00:01:05,990 with Kinesis Data Firehose. 26 00:01:05,990 --> 00:01:09,210 The number one category is AWS destinations 27 00:01:09,210 --> 00:01:11,370 and you need to know them by heart. 28 00:01:11,370 --> 00:01:13,060 So the first one is Amazon S3. 29 00:01:13,060 --> 00:01:16,470 So you can write all your data into Amazon S3. 30 00:01:16,470 --> 00:01:18,580 The second one is Amazon Redshift 31 00:01:18,580 --> 00:01:20,520 which is a warehousing database 32 00:01:20,520 --> 00:01:24,920 and to do so it first writes the data into Amazon history 33 00:01:24,920 --> 00:01:27,860 and then Kinesis Data Firehose will issue a copy command 34 00:01:27,860 --> 00:01:30,410 and this copy command is going to copy data 35 00:01:30,410 --> 00:01:33,970 from Amazon history into Amazon Redshift 36 00:01:33,970 --> 00:01:36,210 and the last destination on AWS 37 00:01:36,210 --> 00:01:38,053 is called Amazon ElasticSearch. 38 00:01:39,130 --> 00:01:42,260 There're also some third party partner destinations. 39 00:01:42,260 --> 00:01:44,310 So Kinesis Data Firehose can send data 40 00:01:44,310 --> 00:01:48,000 into Datadog, Splunk, New Relic, MongoDB 41 00:01:48,000 --> 00:01:50,780 and this list can get bigger and bigger over time. 42 00:01:50,780 --> 00:01:53,540 So it will not update this if there're new partners 43 00:01:53,540 --> 00:01:55,260 but just so you know, there're partners 44 00:01:55,260 --> 00:01:57,750 that Kinesis Data Firehose can send data to 45 00:01:57,750 --> 00:02:02,400 or finally, if you have your own API with an HTTP endpoint 46 00:02:02,400 --> 00:02:05,280 it is for you to send data from Kinesis Data Firehose 47 00:02:05,280 --> 00:02:07,713 into a custom destination. 48 00:02:08,860 --> 00:02:10,570 Okay, so once the data is sent 49 00:02:10,570 --> 00:02:12,750 into all these destinations, you have two options. 50 00:02:12,750 --> 00:02:17,750 You can also send all the data into an S3 bucket as a backup 51 00:02:18,470 --> 00:02:21,030 or just send the data that was failed to be written 52 00:02:21,030 --> 00:02:25,500 into these destinations into a failed S3 buckets. 53 00:02:25,500 --> 00:02:28,340 So to summarize, Kinesis Data Firehose 54 00:02:28,340 --> 00:02:30,010 is a fully managed service. 55 00:02:30,010 --> 00:02:32,780 So there's no administration, automated scaling 56 00:02:32,780 --> 00:02:33,640 and it is serverless. 57 00:02:33,640 --> 00:02:35,090 So no servers to manage. 58 00:02:35,090 --> 00:02:39,080 You can send data into AWS destination such as RedShift, 59 00:02:39,080 --> 00:02:41,770 Amazon S3 and ElasticSearch. 60 00:02:41,770 --> 00:02:42,900 Third-party partners such 61 00:02:42,900 --> 00:02:45,700 as Splunk, MongoDB, Datadog, New Relic, etc 62 00:02:46,720 --> 00:02:51,010 and custom destinations to any HTTP endpoints. 63 00:02:51,010 --> 00:02:52,140 You're going to pay only 64 00:02:52,140 --> 00:02:54,250 for the data going through Firehose. 65 00:02:54,250 --> 00:02:57,110 So this is a very good data pressing model 66 00:02:57,110 --> 00:02:59,410 and it is a near real time, why? 67 00:02:59,410 --> 00:03:01,990 Well, because we write data in batches 68 00:03:01,990 --> 00:03:04,100 from Firehose to the destination. 69 00:03:04,100 --> 00:03:07,330 So there's going to be a 60 seconds latency minimum 70 00:03:07,330 --> 00:03:08,810 for non full batches, 71 00:03:08,810 --> 00:03:12,210 or you need to wait until you have at least 1 megabyte 72 00:03:12,210 --> 00:03:16,060 of data at a time to send the data into the destination 73 00:03:16,060 --> 00:03:18,890 which makes it a near real time service 74 00:03:18,890 --> 00:03:20,520 and not a real-time service. 75 00:03:20,520 --> 00:03:23,560 It supports many data formats, conversions, transformation 76 00:03:23,560 --> 00:03:25,460 and compressions and you can write 77 00:03:25,460 --> 00:03:28,580 your own data transformation using Lambda if you needed to. 78 00:03:28,580 --> 00:03:31,280 Finally, you can send all the failed 79 00:03:31,280 --> 00:03:34,640 or all the data into your backup S3 buckets. 80 00:03:34,640 --> 00:03:37,410 So a question that comes up at the exam usually 81 00:03:37,410 --> 00:03:38,880 is you understand the difference 82 00:03:38,880 --> 00:03:42,730 of when to use Kinesis Data Streams and Kinesis Firehose. 83 00:03:42,730 --> 00:03:44,430 So should be very easy for you now 84 00:03:44,430 --> 00:03:46,770 if you followed closely but let's summarize. 85 00:03:46,770 --> 00:03:49,290 Kinesis Data Streams is just a streaming service used 86 00:03:49,290 --> 00:03:51,360 to ingest data at scale 87 00:03:51,360 --> 00:03:52,600 and you write your own custom code 88 00:03:52,600 --> 00:03:54,330 for your producers and your consumers. 89 00:03:54,330 --> 00:03:55,163 It's real time. 90 00:03:55,163 --> 00:03:57,810 So 200 millisecond or 70 millisecond 91 00:03:57,810 --> 00:03:59,120 and you manage scaling yourself. 92 00:03:59,120 --> 00:04:00,840 You do shard splitting and shard merging 93 00:04:00,840 --> 00:04:03,000 to increase the scale and throughputs. 94 00:04:03,000 --> 00:04:04,220 You're going to also pay 95 00:04:04,220 --> 00:04:06,840 for how much capacity you have provision. 96 00:04:06,840 --> 00:04:08,810 The data storage in the Kinesis Data Stream 97 00:04:08,810 --> 00:04:12,070 can be between 1 to 365 days. 98 00:04:12,070 --> 00:04:13,610 This allows multiple consumers to read 99 00:04:13,610 --> 00:04:17,740 from the same stream and also supports replay capability 100 00:04:17,740 --> 00:04:20,839 whereas Kinesis Data Firehose is an ingestion service 101 00:04:20,839 --> 00:04:24,150 to stream data into S3, Redshift, ElasticSearch 102 00:04:24,150 --> 00:04:26,310 or third party or a custom HTTP. 103 00:04:26,310 --> 00:04:28,250 It is fully managed, no service to manage. 104 00:04:28,250 --> 00:04:29,190 It is near real time. 105 00:04:29,190 --> 00:04:31,130 So remember this, near real-time 106 00:04:31,130 --> 00:04:34,950 is a keyword you need to look at in your exam questions. 107 00:04:34,950 --> 00:04:36,030 There's automated scaling. 108 00:04:36,030 --> 00:04:37,710 So need for you to worry about it 109 00:04:37,710 --> 00:04:38,710 and you're going to pay only 110 00:04:38,710 --> 00:04:41,290 for what goes through Kinesis Data Firehose. 111 00:04:41,290 --> 00:04:44,030 There is no data source so you cannot replay data 112 00:04:44,030 --> 00:04:45,670 from Kinesis Data Firehose. 113 00:04:45,670 --> 00:04:48,180 So yeah, it doesn't support replay capability. 114 00:04:48,180 --> 00:04:50,800 So that's it's for the overview 115 00:04:50,800 --> 00:04:52,750 of Kinesis Data Firehose. 116 00:04:52,750 --> 00:04:54,130 I hope that makes sense 117 00:04:54,130 --> 00:04:56,080 and I will see you in the next lecture.