1 00:00:00,210 --> 00:00:02,670 Okay, so now let's talk about Amazon SageMaker. 2 00:00:02,670 --> 00:00:05,480 So SageMaker is a fully managed service 3 00:00:05,480 --> 00:00:07,280 for developers and data scientists 4 00:00:07,280 --> 00:00:08,950 to build machine learning model. 5 00:00:08,950 --> 00:00:10,760 So all of the services so far we've seen 6 00:00:10,760 --> 00:00:12,130 in this section that are all 7 00:00:12,130 --> 00:00:14,180 managed machine learning service with 8 00:00:14,180 --> 00:00:15,770 a very specific purpose, 9 00:00:15,770 --> 00:00:18,140 for example translates some text, 10 00:00:18,140 --> 00:00:19,500 transcribes some audio, 11 00:00:19,500 --> 00:00:21,130 or convert text into audio 12 00:00:21,130 --> 00:00:22,450 or analyze parts of a text, 13 00:00:22,450 --> 00:00:25,220 but SageMaker is a higher level 14 00:00:25,220 --> 00:00:26,980 machine learning service where you have 15 00:00:26,980 --> 00:00:29,070 your actual developers or your data scientists 16 00:00:29,070 --> 00:00:31,720 within your organization create 17 00:00:31,720 --> 00:00:33,260 and build machine learning model. 18 00:00:33,260 --> 00:00:34,900 So it is a lot more involved 19 00:00:34,900 --> 00:00:37,180 and a lot more difficult to use. 20 00:00:37,180 --> 00:00:40,090 Now, when you want to do this kind of processes 21 00:00:40,090 --> 00:00:41,920 to build a machine learning model, 22 00:00:41,920 --> 00:00:43,640 you have to do a bunch of steps that I will show 23 00:00:43,640 --> 00:00:44,630 you in a second, 24 00:00:44,630 --> 00:00:46,200 and all these are quite difficult to do 25 00:00:46,200 --> 00:00:47,080 in one place, 26 00:00:47,080 --> 00:00:49,500 plus you need to provision some servers 27 00:00:49,500 --> 00:00:52,420 to perform these competitions to create these models, 28 00:00:52,420 --> 00:00:54,380 and that can be cumbersome as well. 29 00:00:54,380 --> 00:00:56,150 So this is where SageMaker comes in. 30 00:00:56,150 --> 00:00:58,860 So SageMaker will try to help you all along the way 31 00:00:58,860 --> 00:01:00,010 for the process. 32 00:01:00,010 --> 00:01:02,595 So I will show you what machine learning actually is, 33 00:01:02,595 --> 00:01:04,209 and this is a simplified version, 34 00:01:04,209 --> 00:01:05,500 if you are a machine learning expert 35 00:01:05,500 --> 00:01:07,500 don't get mad at me please, 36 00:01:07,500 --> 00:01:09,380 but let's say you wanted to build a model, 37 00:01:09,380 --> 00:01:11,170 or let's say I wanted to build a model 38 00:01:11,170 --> 00:01:13,380 to predict what score you're going to get 39 00:01:13,380 --> 00:01:15,780 at your certified CLAP practitioner exam. 40 00:01:15,780 --> 00:01:17,020 So how will I do it? 41 00:01:17,020 --> 00:01:18,900 Say for example that I am a developer 42 00:01:18,900 --> 00:01:19,970 or a data scientist, 43 00:01:19,970 --> 00:01:22,528 so I'm going to gather all your data from 44 00:01:22,528 --> 00:01:24,840 the actual performance of my students. 45 00:01:24,840 --> 00:01:26,707 So I will ask maybe 10,000 students 46 00:01:26,707 --> 00:01:29,680 to give me information about how many years of experience 47 00:01:29,680 --> 00:01:30,750 in IT they had, 48 00:01:30,750 --> 00:01:32,680 how many years of experience with AWS they had, 49 00:01:32,680 --> 00:01:34,100 how much time they spent on the course, 50 00:01:34,100 --> 00:01:36,500 how many practice exams they did, etc. etc, 51 00:01:36,500 --> 00:01:38,880 so I'll try to gather as much data as possible, 52 00:01:38,880 --> 00:01:40,340 and then I'm going to label the data. 53 00:01:40,340 --> 00:01:42,780 So that means you need to say which columns 54 00:01:42,780 --> 00:01:43,630 corresponds to what, 55 00:01:43,630 --> 00:01:45,753 and also you need to give some kind of score, 56 00:01:45,753 --> 00:01:48,200 and the score is the actual score for me 57 00:01:48,200 --> 00:01:49,680 at the exam that someone obtains. 58 00:01:49,680 --> 00:01:50,910 For example someone did not pass, 59 00:01:50,910 --> 00:01:54,160 670, maybe he didn't do the course completely, 60 00:01:54,160 --> 00:01:55,320 that would be the reason. 61 00:01:55,320 --> 00:01:57,610 Maybe someone passed with high grade, 62 00:01:57,610 --> 00:02:02,610 so 990 or maybe someone with an even higher grade 934, 63 00:02:03,000 --> 00:02:06,464 so, each student will get a specific score, 64 00:02:06,464 --> 00:02:10,080 and my guess is that based on the data that I've collected 65 00:02:10,080 --> 00:02:12,250 I can predict what the score will be. 66 00:02:12,250 --> 00:02:13,610 So I've first done the labeling, 67 00:02:13,610 --> 00:02:15,760 and that labeling process can be quite complicated to do 68 00:02:15,760 --> 00:02:16,960 in practice. 69 00:02:16,960 --> 00:02:18,820 Then I need to build a machine learning model, 70 00:02:18,820 --> 00:02:21,530 which is how can I predict these scores 71 00:02:21,530 --> 00:02:23,570 from the historical data. 72 00:02:23,570 --> 00:02:25,040 So, then you build that machine learning model 73 00:02:25,040 --> 00:02:26,480 and then you have to train it 74 00:02:26,480 --> 00:02:27,313 and tune it. 75 00:02:27,313 --> 00:02:29,610 So this is another part that's actually quite difficult 76 00:02:29,610 --> 00:02:32,270 to do, which is how to refine my model over time 77 00:02:32,270 --> 00:02:36,250 to better fit my data and my outputs. 78 00:02:36,250 --> 00:02:39,210 Okay, so now SageMaker and all of this will help 79 00:02:39,210 --> 00:02:40,100 you with the labeling, 80 00:02:40,100 --> 00:02:41,440 the building, the training 81 00:02:41,440 --> 00:02:43,030 and tuning, but not only. 82 00:02:43,030 --> 00:02:44,760 Now we have a machine learning model 83 00:02:44,760 --> 00:02:45,800 and it is created, 84 00:02:45,800 --> 00:02:46,850 it's fully working. 85 00:02:46,850 --> 00:02:48,350 So now I need to use it. 86 00:02:48,350 --> 00:02:50,210 So this is called deploying machine learning models. 87 00:02:50,210 --> 00:02:52,670 So, we're going to get new data coming in. 88 00:02:52,670 --> 00:02:54,500 For example you are the new student, 89 00:02:54,500 --> 00:02:55,780 and I'm going to survey you, 90 00:02:55,780 --> 00:02:56,613 and I'm going to say okay, 91 00:02:56,613 --> 00:02:58,100 how many years of experience do you have in IT, 92 00:02:58,100 --> 00:03:00,400 with AWS, how much time have you spent on this course, 93 00:03:00,400 --> 00:03:02,830 and then I will apply based on this data 94 00:03:02,830 --> 00:03:03,810 you just given me, 95 00:03:03,810 --> 00:03:06,240 I will apply the machine learning model that I have created 96 00:03:06,240 --> 00:03:07,220 from before. 97 00:03:07,220 --> 00:03:08,940 And then the machine learning model will say, 98 00:03:08,940 --> 00:03:12,350 for example, hey it sends based on the data you have 99 00:03:12,350 --> 00:03:14,250 I'm going to predict that this student 100 00:03:14,250 --> 00:03:17,350 will pass with a score of 906. 101 00:03:17,350 --> 00:03:18,910 And this whole process, 102 00:03:18,910 --> 00:03:20,240 of labeling, building the model, 103 00:03:20,240 --> 00:03:21,310 training it, tuning it, 104 00:03:21,310 --> 00:03:24,530 applying it can be all done within SageMaker. 105 00:03:24,530 --> 00:03:26,570 So, that's it for a quick introduction, 106 00:03:26,570 --> 00:03:27,420 but I hope you liked it, 107 00:03:27,420 --> 00:03:29,370 and I will see you in the next lecture.