1 00:00:01,080 --> 00:00:01,950 Hey, Cloud Gurus. 2 00:00:01,950 --> 00:00:04,540 Welcome to our lesson on configuring error handling 3 00:00:04,540 --> 00:00:05,933 for transformations. 4 00:00:07,820 --> 00:00:10,650 In this lesson, we'll talk a little bit about the need 5 00:00:10,650 --> 00:00:14,530 for error handling and how we go about handling those errors 6 00:00:14,530 --> 00:00:16,743 in our pipeline transformations. 7 00:00:17,600 --> 00:00:20,180 We'll do a demo in Azure Data Factory, 8 00:00:20,180 --> 00:00:22,263 and wrap everything up with a review. 9 00:00:24,370 --> 00:00:26,260 To start off with, let's talk about the fact 10 00:00:26,260 --> 00:00:28,650 that errors just happen. 11 00:00:28,650 --> 00:00:30,910 They're a common part of data work 12 00:00:30,910 --> 00:00:33,130 and, really, IT in general. 13 00:00:33,130 --> 00:00:35,630 And yes, while it's sad, the question's not 14 00:00:35,630 --> 00:00:37,100 whether there will be errors, 15 00:00:37,100 --> 00:00:39,620 but what you're going to do with them. 16 00:00:39,620 --> 00:00:42,330 For our transformations, you'll run into a few common types, 17 00:00:42,330 --> 00:00:44,970 such as, data is truncated. 18 00:00:44,970 --> 00:00:46,340 So when you're bringing the data over, 19 00:00:46,340 --> 00:00:50,040 part of it is truncated when you don't really want it to be, 20 00:00:50,040 --> 00:00:54,210 you can't insert NULL values into particular columns, 21 00:00:54,210 --> 00:00:56,940 and maybe you run into constraint conflicts. 22 00:00:56,940 --> 00:00:58,240 Each of these can be troubling, 23 00:00:58,240 --> 00:00:59,790 but there are ways around them. 24 00:01:02,330 --> 00:01:04,420 By default, your transformations are going to fail 25 00:01:04,420 --> 00:01:05,720 on the first error. 26 00:01:05,720 --> 00:01:08,900 However, you can select continue on error. 27 00:01:08,900 --> 00:01:12,460 This is your first and usually best option. 28 00:01:12,460 --> 00:01:15,650 Once you enable this, a few other options open up to you, 29 00:01:15,650 --> 00:01:17,840 including transaction commit. 30 00:01:17,840 --> 00:01:19,740 This is where you choose whether to write data 31 00:01:19,740 --> 00:01:22,640 in a single transaction or in batches. 32 00:01:22,640 --> 00:01:25,510 A single transaction is going to have worse performance, 33 00:01:25,510 --> 00:01:27,810 but it keeps the data from being visible to others 34 00:01:27,810 --> 00:01:29,770 until the transaction is complete. 35 00:01:29,770 --> 00:01:30,880 So if you run into issues, 36 00:01:30,880 --> 00:01:33,670 people aren't running off of inaccurate data. 37 00:01:33,670 --> 00:01:36,880 You can also choose to output rejected data. 38 00:01:36,880 --> 00:01:40,460 This sends error rows to a CSV file in Azure Storage, 39 00:01:40,460 --> 00:01:44,150 including the SQL operation and error information, 40 00:01:44,150 --> 00:01:46,160 such as the data flow error code. 41 00:01:46,160 --> 00:01:48,720 and the error message on the row. 42 00:01:48,720 --> 00:01:51,380 And finally, you can choose success on error, 43 00:01:51,380 --> 00:01:53,170 where you simply mark it as successful, 44 00:01:53,170 --> 00:01:55,500 whether an error occurs or not. 45 00:01:55,500 --> 00:01:57,720 There are times when you may expect errors, 46 00:01:57,720 --> 00:02:01,070 or having an error isn't very consequential to your process. 47 00:02:01,070 --> 00:02:03,430 And so you can tell it to continue on its merry way, 48 00:02:03,430 --> 00:02:05,070 and not interrupt what you're doing, 49 00:02:05,070 --> 00:02:06,770 or if you know what the behavior is going to be, 50 00:02:06,770 --> 00:02:08,980 you can mitigate it in some way. 51 00:02:08,980 --> 00:02:10,800 These options are going to cover the majority 52 00:02:10,800 --> 00:02:13,090 of your error handling situations. 53 00:02:13,090 --> 00:02:15,800 However, you do also have the conditional split 54 00:02:15,800 --> 00:02:16,693 available to you. 55 00:02:17,577 --> 00:02:19,310 And this is for those times when you need 56 00:02:19,310 --> 00:02:23,870 to go beyond the automated catch-all of continue on error. 57 00:02:23,870 --> 00:02:27,060 You can use custom logic with specialized conditions 58 00:02:27,060 --> 00:02:29,040 to route bad rows to logging, 59 00:02:29,040 --> 00:02:31,410 and then continue on successfully. 60 00:02:31,410 --> 00:02:32,243 So for example, 61 00:02:32,243 --> 00:02:35,920 if we have our source and we run our transformation, 62 00:02:35,920 --> 00:02:38,440 we can take the successful or good rows, 63 00:02:38,440 --> 00:02:41,330 and send them on to Azure SQL Database, 64 00:02:41,330 --> 00:02:43,440 or we can take the bad rows and log 65 00:02:43,440 --> 00:02:46,233 into a CSV file in our Azure Data Lake. 66 00:02:48,130 --> 00:02:49,900 Let's jump over to the portal and take a look 67 00:02:49,900 --> 00:02:51,830 at setting the error handling features 68 00:02:51,830 --> 00:02:53,023 that we just discussed. 69 00:02:55,220 --> 00:02:57,610 Here we are an Azure Data Factory Studio, 70 00:02:57,610 --> 00:03:00,040 and I've already pulled up one of the data we worked with 71 00:03:00,040 --> 00:03:04,283 in a pipeline earlier, our NightlyLoad pipeline. 72 00:03:05,300 --> 00:03:07,980 On our product average sync, if we come over 73 00:03:07,980 --> 00:03:11,100 to settings, all the way at the bottom, 74 00:03:11,100 --> 00:03:15,460 this is where we find our error row handling settings. 75 00:03:15,460 --> 00:03:18,660 If we expand that and scroll some more, 76 00:03:18,660 --> 00:03:20,280 you can see that by default, 77 00:03:20,280 --> 00:03:22,543 it's set to stop on first error. 78 00:03:23,560 --> 00:03:28,400 We can change that to continue on error. 79 00:03:28,400 --> 00:03:31,910 And then we receive our additional options. 80 00:03:31,910 --> 00:03:35,210 We have transaction commit, as we said earlier, 81 00:03:35,210 --> 00:03:37,163 and that can be batch or single. 82 00:03:39,180 --> 00:03:42,330 We can output our rejected data and we would need 83 00:03:42,330 --> 00:03:44,040 to choose a linked service, 84 00:03:44,040 --> 00:03:46,850 such as an Azure Data Lake, and then a directory 85 00:03:46,850 --> 00:03:50,220 within there to send our bad rows off to. 86 00:03:50,220 --> 00:03:53,280 And we can also report success on error. 87 00:03:53,280 --> 00:03:55,240 And again, that's simply just going to flag it 88 00:03:55,240 --> 00:03:58,650 as successful, whether or not there are errors. 89 00:03:58,650 --> 00:04:01,260 And so all of these options are right here available to you 90 00:04:01,260 --> 00:04:04,140 within the settings tab of your sync, 91 00:04:04,140 --> 00:04:05,783 no matter what your sync is. 92 00:04:08,020 --> 00:04:11,290 By way of review, errors are going to happen. 93 00:04:11,290 --> 00:04:12,890 It's just part of life. 94 00:04:12,890 --> 00:04:16,040 And by default, they will stop your transformations. 95 00:04:16,040 --> 00:04:19,110 For most situations, you can set it to continue on error 96 00:04:19,110 --> 00:04:21,470 in order to handle error rows, 97 00:04:21,470 --> 00:04:23,260 and then utilize any of those options 98 00:04:23,260 --> 00:04:25,260 that we just discussed and demoed, 99 00:04:25,260 --> 00:04:27,970 depending on what your situation is. 100 00:04:27,970 --> 00:04:30,100 And you can also use a conditional split, 101 00:04:30,100 --> 00:04:32,960 in order to mitigate known situations. 102 00:04:32,960 --> 00:04:35,200 Let's say you want to take those specific rows 103 00:04:35,200 --> 00:04:36,840 and put them in another database. 104 00:04:36,840 --> 00:04:38,870 Your conditional split can take care of that 105 00:04:38,870 --> 00:04:41,400 by matching the pattern you know is going to occur, 106 00:04:41,400 --> 00:04:44,360 and then splitting them off in that direction. 107 00:04:44,360 --> 00:04:45,500 That's it for now, Gurus. 108 00:04:45,500 --> 00:04:47,120 I hope you've enjoyed this video. 109 00:04:47,120 --> 00:04:49,520 When you're ready, I'll see you in the next one.