1 00:00:00,220 --> 00:00:02,820 So now let's talk about a service that is named after 2 00:00:02,820 --> 00:00:05,400 what it does it is AWS Batch. 3 00:00:05,400 --> 00:00:08,350 So batch is a fully managed batch processing service 4 00:00:08,350 --> 00:00:11,330 that can allow you to do batch processing at any scale. 5 00:00:11,330 --> 00:00:12,500 And with the batch service, 6 00:00:12,500 --> 00:00:14,480 you can efficiently run hundreds of thousands 7 00:00:14,480 --> 00:00:18,010 of computing batch jobs on AWS very easily. 8 00:00:18,010 --> 00:00:19,510 So what is a batch job? 9 00:00:19,510 --> 00:00:23,060 Well, a batch job is a job that has a start and an end. 10 00:00:23,060 --> 00:00:25,370 And that is opposed to say, a continuous 11 00:00:25,370 --> 00:00:28,210 or a streaming job that really doesn't ever end 12 00:00:28,210 --> 00:00:29,650 it's always running. 13 00:00:29,650 --> 00:00:30,590 But a batch job say, 14 00:00:30,590 --> 00:00:34,470 for example, starts at 1 a.m. and finishes at 3 a.m. 15 00:00:34,470 --> 00:00:38,150 So a batch job has a point of time when it happens 16 00:00:38,150 --> 00:00:40,300 and so the batch service will 17 00:00:40,300 --> 00:00:44,060 dynamically launch EC2 instances or Spot Instances 18 00:00:44,060 --> 00:00:45,930 to accommodate with the load 19 00:00:45,930 --> 00:00:48,500 that you have to run these batch jobs. 20 00:00:48,500 --> 00:00:51,690 So batch will provision the right amount of compute 21 00:00:51,690 --> 00:00:54,830 and memory for you to deal with your batch queue. 22 00:00:54,830 --> 00:00:57,570 And you just submit or scheduled batch jobs 23 00:00:57,570 --> 00:01:01,580 into the batch queue and the batch service does the rest. 24 00:01:01,580 --> 00:01:03,050 Now how do you define a batch job? 25 00:01:03,050 --> 00:01:05,470 Well, it is simply a Docker image 26 00:01:05,470 --> 00:01:08,550 and a test definition that you run on the ECS service. 27 00:01:08,550 --> 00:01:10,760 So this is pretty much saying that anything 28 00:01:10,760 --> 00:01:13,150 that can run on ECS can run on batch. 29 00:01:13,150 --> 00:01:15,200 And this is going to be very helpful to use batch 30 00:01:15,200 --> 00:01:16,610 to run these batch jobs. 31 00:01:16,610 --> 00:01:18,700 And because it automatically scales 32 00:01:18,700 --> 00:01:21,960 the right number of ECS2 instances or Spot Instances, 33 00:01:21,960 --> 00:01:23,260 to do these jobs, 34 00:01:23,260 --> 00:01:25,450 then you get lots of cost optimizations 35 00:01:25,450 --> 00:01:27,860 and you focus a lot less on the infrastructure, 36 00:01:27,860 --> 00:01:30,090 you just focus on your batch jobs. 37 00:01:30,090 --> 00:01:32,280 So this should be more than enough for going to the exam, 38 00:01:32,280 --> 00:01:35,410 but I just want to show you a small diagram that I made. 39 00:01:35,410 --> 00:01:38,710 So for example, say we wanted to process images submitted 40 00:01:38,710 --> 00:01:41,740 by users into Amazon S3 in a batch way. 41 00:01:41,740 --> 00:01:44,960 So image will be put into Amazon S3, 42 00:01:44,960 --> 00:01:47,850 and this will trigger a batch job. 43 00:01:47,850 --> 00:01:49,820 And so batch will automatically have 44 00:01:49,820 --> 00:01:52,580 an ECS cluster made of EC2 instances, 45 00:01:52,580 --> 00:01:54,880 or Spot Instances and batch would make sure that 46 00:01:54,880 --> 00:01:56,900 you have the right amount of instances 47 00:01:56,900 --> 00:01:58,870 to accommodate the load of batch jobs 48 00:01:58,870 --> 00:02:00,340 you have in the batch queue. 49 00:02:00,340 --> 00:02:02,880 And then these instances will be running 50 00:02:02,880 --> 00:02:05,940 your Docker images that will be doing your job. 51 00:02:05,940 --> 00:02:08,009 And then maybe that job will be to insert 52 00:02:08,009 --> 00:02:09,190 the processed object. 53 00:02:09,190 --> 00:02:11,190 Maybe it's a filter on top of the image 54 00:02:11,190 --> 00:02:13,560 into another Amazon S3 buckets. 55 00:02:13,560 --> 00:02:14,550 So the question you may have is 56 00:02:14,550 --> 00:02:16,250 what is the difference between batch and Lambda 57 00:02:16,250 --> 00:02:17,950 because they look similar? 58 00:02:17,950 --> 00:02:19,710 So Lambda has a time limit, 59 00:02:19,710 --> 00:02:21,090 it's 15 minutes, 60 00:02:21,090 --> 00:02:24,230 and you only get access to a few programming languages. 61 00:02:24,230 --> 00:02:27,680 On top of it, you have limited temporary disk space 62 00:02:27,680 --> 00:02:29,080 if you want to run your jobs, 63 00:02:29,080 --> 00:02:30,880 and it's going to be serverless, 64 00:02:30,880 --> 00:02:32,320 whereas batch is very different. 65 00:02:32,320 --> 00:02:33,750 So batch has no time limit, 66 00:02:33,750 --> 00:02:36,540 because it relies on EC2 instances. 67 00:02:36,540 --> 00:02:38,600 It's any runtime that you want as long 68 00:02:38,600 --> 00:02:40,840 as you package it as a Docker image. 69 00:02:40,840 --> 00:02:42,320 And for storage, 70 00:02:42,320 --> 00:02:45,700 you rely on the storage that comes with an EC2 instance. 71 00:02:45,700 --> 00:02:47,070 So it could be an EBS volume, 72 00:02:47,070 --> 00:02:49,637 or an EC2 instance store for disk space, 73 00:02:49,637 --> 00:02:52,810 which can be a lot more than for Lambda functions. 74 00:02:52,810 --> 00:02:55,460 And then finally, batch is not a serverless service. 75 00:02:55,460 --> 00:02:56,450 It's a managed service, 76 00:02:56,450 --> 00:02:59,670 but it relies on actual EC2 instances being created. 77 00:02:59,670 --> 00:03:03,330 But these EC2 instances are managed by AWS 78 00:03:03,330 --> 00:03:04,230 so we don't have to worry 79 00:03:04,230 --> 00:03:06,360 about the auto scaling and so on. 80 00:03:06,360 --> 00:03:07,193 So I hope that was helpful 81 00:03:07,193 --> 00:03:08,910 and I will see you in the next lecture.