1 00:00:00,720 --> 00:00:01,800 In this lesson, 2 00:00:01,800 --> 00:00:04,820 we are going to be talking about upserting data 3 00:00:04,820 --> 00:00:07,000 using data flow. 4 00:00:07,000 --> 00:00:09,580 Little aside, as I built this lesson, 5 00:00:09,580 --> 00:00:12,530 autocorrect desperately wants to use upsetting, 6 00:00:12,530 --> 00:00:14,270 rather than upserting, 7 00:00:14,270 --> 00:00:17,400 so no matter how upsetting autocorrect finds it, 8 00:00:17,400 --> 00:00:20,400 we are going to talk about upserting in this lesson, 9 00:00:20,400 --> 00:00:22,790 starting with some definitions and terms 10 00:00:22,790 --> 00:00:24,600 that you need to understand. 11 00:00:24,600 --> 00:00:25,830 Of course, 12 00:00:25,830 --> 00:00:27,210 we're going to look at the steps that you need to do 13 00:00:27,210 --> 00:00:30,350 in order to use upserting in Data Factory, 14 00:00:30,350 --> 00:00:32,960 and then finally, we're going to see all of this in action. 15 00:00:32,960 --> 00:00:34,803 So with that, let's get started. 16 00:00:36,920 --> 00:00:40,260 First, the definition of upserting. 17 00:00:40,260 --> 00:00:42,610 The definition is an operation that allows you 18 00:00:42,610 --> 00:00:45,280 to either insert rows into a database table 19 00:00:45,280 --> 00:00:50,070 if they don't exist, or update them if they do. 20 00:00:50,070 --> 00:00:53,060 So the key here is alteration. 21 00:00:53,060 --> 00:00:55,970 We, instead of replacing the entire database 22 00:00:55,970 --> 00:00:59,210 every time we renew or update something, 23 00:00:59,210 --> 00:01:02,070 what we're going to do is we're going to intelligently look, 24 00:01:02,070 --> 00:01:03,800 and if we have new records, 25 00:01:03,800 --> 00:01:05,310 we're going to add them, 26 00:01:05,310 --> 00:01:07,930 or we're going to change them if they already exist. 27 00:01:07,930 --> 00:01:10,220 And again, this is going to be dependent upon the code 28 00:01:10,220 --> 00:01:12,973 that we insert into Data Factory. 29 00:01:15,380 --> 00:01:18,050 So along with that, if your only tool is a hammer, 30 00:01:18,050 --> 00:01:20,510 every problem looks like a nail. 31 00:01:20,510 --> 00:01:25,070 So as we look at the different types of updates 32 00:01:25,070 --> 00:01:27,700 to a database, the first is upload. 33 00:01:27,700 --> 00:01:28,533 If we upload, 34 00:01:28,533 --> 00:01:31,600 we're just simply going in and we're adding a few records. 35 00:01:31,600 --> 00:01:33,460 Next, we can do full. 36 00:01:33,460 --> 00:01:34,530 And if we do full, 37 00:01:34,530 --> 00:01:38,470 we're literally replacing the entire database. 38 00:01:38,470 --> 00:01:39,980 We can do incremental, 39 00:01:39,980 --> 00:01:41,920 which means we look at where we stopped, 40 00:01:41,920 --> 00:01:44,270 and we just simply add from there, 41 00:01:44,270 --> 00:01:46,850 or, like we talked about, we can do upserting, 42 00:01:46,850 --> 00:01:50,290 which is editing the rows where changes are, 43 00:01:50,290 --> 00:01:53,530 and adding new data where it doesn't exist. 44 00:01:53,530 --> 00:01:57,430 So these are the 4 different types of updates 45 00:01:57,430 --> 00:01:58,930 to database records, 46 00:01:58,930 --> 00:02:01,940 upload, full, incremental and upsert. 47 00:02:01,940 --> 00:02:04,450 It's important that you understand what those are 48 00:02:04,450 --> 00:02:07,403 and that you understand what the differences are as well. 49 00:02:09,290 --> 00:02:12,120 So when we talk about the basic steps of upserting, 50 00:02:12,120 --> 00:02:13,790 here's what we're doing. 51 00:02:13,790 --> 00:02:16,253 First, we're going to go into Data Flow. 52 00:02:17,360 --> 00:02:19,763 Then we're going to pick our source. 53 00:02:20,790 --> 00:02:24,113 We're going to choose and add an AlterRow module. 54 00:02:25,020 --> 00:02:27,533 We're going to create our Upsert If conditions, 55 00:02:28,430 --> 00:02:29,610 and then we're going to finish 56 00:02:29,610 --> 00:02:32,480 with our destination or our sink. 57 00:02:32,480 --> 00:02:34,770 So these are the basic steps that we would do 58 00:02:34,770 --> 00:02:38,470 in order to create upserting in Data Factory. 59 00:02:38,470 --> 00:02:40,240 Create our data flow, 60 00:02:40,240 --> 00:02:42,610 add our source, always, 61 00:02:42,610 --> 00:02:43,920 choose AlterRow, 62 00:02:43,920 --> 00:02:45,040 which is what we're going to use 63 00:02:45,040 --> 00:02:47,730 to do our upsert conditions, 64 00:02:47,730 --> 00:02:50,360 then we're going to add those upsert conditions, 65 00:02:50,360 --> 00:02:53,360 and then, finally, we always finish with a sink. 66 00:02:53,360 --> 00:02:54,480 So with that, 67 00:02:54,480 --> 00:02:56,630 let's go ahead and jump over into the portal, 68 00:02:56,630 --> 00:02:58,683 and I'll show you what that looks like. 69 00:03:01,540 --> 00:03:04,510 So here we find ourselves in Data Factory, 70 00:03:04,510 --> 00:03:06,630 and I've opened up a data flow. 71 00:03:06,630 --> 00:03:09,320 So I've already added my source, 72 00:03:09,320 --> 00:03:11,100 and I just created a simple source. 73 00:03:11,100 --> 00:03:13,550 Now, one of the things when you create your source, 74 00:03:13,550 --> 00:03:16,560 you want to make sure that you've allowed schema drift. 75 00:03:16,560 --> 00:03:18,290 This is not always going to be the case, 76 00:03:18,290 --> 00:03:20,420 but normally if you're going to use upserting, 77 00:03:20,420 --> 00:03:22,520 you'll probably want to have schema drift on, 78 00:03:22,520 --> 00:03:23,960 unless there's a specific reason 79 00:03:23,960 --> 00:03:26,473 that you know your database isn't going to change. 80 00:03:27,880 --> 00:03:28,910 So then we're going to go ahead 81 00:03:28,910 --> 00:03:31,270 and choose this little plus button, 82 00:03:31,270 --> 00:03:33,320 scroll all the way down to the bottom, 83 00:03:33,320 --> 00:03:37,210 and we are going to choose the AlterRow module. 84 00:03:37,210 --> 00:03:39,490 So with our AlterRow on, 85 00:03:39,490 --> 00:03:40,380 going to come down here, 86 00:03:40,380 --> 00:03:42,230 and you can see we've got our incoming stream 87 00:03:42,230 --> 00:03:44,690 set as our source number 1, 88 00:03:44,690 --> 00:03:47,580 and then we're going to choose our AlterRow conditions, 89 00:03:47,580 --> 00:03:50,510 and this is where we can do our insert, update, 90 00:03:50,510 --> 00:03:54,320 delete, or upsert, which is what we want to do. 91 00:03:54,320 --> 00:03:55,840 Once we have that, 92 00:03:55,840 --> 00:03:58,670 we'll just go in here to our expression builder, 93 00:03:58,670 --> 00:04:02,620 and we can add in whatever conditions we have, 94 00:04:02,620 --> 00:04:04,760 and the conditions aren't really all that important, 95 00:04:04,760 --> 00:04:08,940 because it's going to be very specific to your circumstance. 96 00:04:08,940 --> 00:04:11,140 So it's not something that you're probably going 97 00:04:11,140 --> 00:04:12,910 to have to do on the DP-203, 98 00:04:12,910 --> 00:04:14,530 as in, I'm going to add in expressions, 99 00:04:14,530 --> 00:04:15,710 or see questions about that. 100 00:04:15,710 --> 00:04:17,940 I think that's very unlikely. 101 00:04:17,940 --> 00:04:20,590 So you would just add in something like, isNull, 102 00:04:20,590 --> 00:04:22,820 and then we have our column name here, 103 00:04:22,820 --> 00:04:25,020 and then we could have false or, 104 00:04:25,020 --> 00:04:26,620 again, whatever your expression is. 105 00:04:26,620 --> 00:04:28,570 We would just add all of that in here. 106 00:04:28,570 --> 00:04:30,610 We would click on Save and Finish, 107 00:04:30,610 --> 00:04:33,740 and then now we have our upsert condition. 108 00:04:33,740 --> 00:04:35,160 And then if you remember, 109 00:04:35,160 --> 00:04:39,560 the last step to this is always adding in our sink. 110 00:04:39,560 --> 00:04:44,040 So we'll just scroll down and choose a destination sink. 111 00:04:44,040 --> 00:04:46,520 Again, we want to allow schema drift in the source 112 00:04:46,520 --> 00:04:47,493 and the sink. 113 00:04:47,493 --> 00:04:49,780 If we're going to do one, we need to do the other. 114 00:04:49,780 --> 00:04:50,980 And then once this is done, 115 00:04:50,980 --> 00:04:54,170 we would just simply publish this data flow, 116 00:04:54,170 --> 00:04:55,520 then go into a pipeline, 117 00:04:55,520 --> 00:04:58,870 and we could add this data flow into our pipeline, 118 00:04:58,870 --> 00:05:01,700 which would then be looking to do upsert 119 00:05:01,700 --> 00:05:03,500 based upon the source. 120 00:05:03,500 --> 00:05:06,010 So those are the steps that you would need to do 121 00:05:06,010 --> 00:05:09,003 in order to do upserting in Data Factory. 122 00:05:10,170 --> 00:05:14,350 So with that, let's jump in and finish out this lesson. 123 00:05:14,350 --> 00:05:17,320 So what are the options for data loading? 124 00:05:17,320 --> 00:05:19,560 Make sure that you know the different options, 125 00:05:19,560 --> 00:05:22,440 whether that's full or incremental or upsert, 126 00:05:22,440 --> 00:05:25,133 just make sure you're aware of alternate solutions. 127 00:05:26,120 --> 00:05:28,400 Make sure that you understand the basic steps 128 00:05:28,400 --> 00:05:30,750 for upserting in Data Factory. 129 00:05:30,750 --> 00:05:32,490 So we have to have a source, 130 00:05:32,490 --> 00:05:34,180 we're going to add the AlterRow, 131 00:05:34,180 --> 00:05:36,200 we're going to add in our conditions, 132 00:05:36,200 --> 00:05:39,130 then we're going to look to add in our destination, 133 00:05:39,130 --> 00:05:40,110 so just our sink. 134 00:05:40,110 --> 00:05:42,323 Make sure that you know those basic steps. 135 00:05:43,220 --> 00:05:46,740 And remember that the key to upserting is alteration. 136 00:05:46,740 --> 00:05:49,130 We're altering rows if they exist, 137 00:05:49,130 --> 00:05:52,340 or we're adding new rows if they don't exist. 138 00:05:52,340 --> 00:05:53,520 You got all that done, 139 00:05:53,520 --> 00:05:56,670 then you should be good to go on to the next lesson. 140 00:05:56,670 --> 00:05:59,120 That's it for this one, I'll see you in the next.