1 00:00:00,640 --> 00:00:01,720 Hey, Cloud Gurus, 2 00:00:01,720 --> 00:00:04,330 welcome to our lesson on splitting data, 3 00:00:04,330 --> 00:00:06,863 learning how to create and take different paths. 4 00:00:08,940 --> 00:00:10,960 In this lesson, we're going to briefly go over 5 00:00:10,960 --> 00:00:13,740 source branches and conditional splits, 6 00:00:13,740 --> 00:00:15,440 really just introducing them 7 00:00:15,440 --> 00:00:20,050 and then jumping into the demo so that we can see them live. 8 00:00:20,050 --> 00:00:22,823 After that, we'll wrap everything up with a review. 9 00:00:25,790 --> 00:00:28,120 New branches allow you to take multiple paths 10 00:00:28,120 --> 00:00:32,470 to multiple sinks while branching from the same source. 11 00:00:32,470 --> 00:00:35,330 This is a great way to self-join data together, 12 00:00:35,330 --> 00:00:38,053 or even just branch it out to multiple sinks. 13 00:00:38,890 --> 00:00:41,338 This allows you to fully utilize one source 14 00:00:41,338 --> 00:00:43,890 instead of creating multiple data flows 15 00:00:43,890 --> 00:00:46,780 to read from that one source. 16 00:00:46,780 --> 00:00:48,840 You can add these either at the source object 17 00:00:48,840 --> 00:00:50,890 or after a transformation, 18 00:00:50,890 --> 00:00:53,140 and it takes the entire results set 19 00:00:53,140 --> 00:00:56,353 from that particular point into a completely new direction. 20 00:00:58,100 --> 00:01:01,023 Another option is conditional splits. 21 00:01:02,140 --> 00:01:04,400 These route data to particular streams 22 00:01:04,400 --> 00:01:07,160 based on specified conditions. 23 00:01:07,160 --> 00:01:08,970 And so, it evaluates expressions 24 00:01:08,970 --> 00:01:11,620 and then routes the data into the appropriate stream. 25 00:01:12,560 --> 00:01:14,720 This is similar to a CASE statement, 26 00:01:14,720 --> 00:01:17,930 if you've ever used a traditional programming language. 27 00:01:17,930 --> 00:01:20,000 And contrasting to the new branch, 28 00:01:20,000 --> 00:01:23,150 it allows you to take particular subsets of the data 29 00:01:23,150 --> 00:01:24,420 into a new direction 30 00:01:24,420 --> 00:01:26,763 instead of taking the entire stream of data. 31 00:01:29,090 --> 00:01:30,030 With that in mind, 32 00:01:30,030 --> 00:01:31,630 let's jump over to the Azure portal 33 00:01:31,630 --> 00:01:33,280 and take a look at both of these. 34 00:01:35,560 --> 00:01:37,270 Here we are in the Azure portal, 35 00:01:37,270 --> 00:01:39,970 in my Azure Data Factory Studio. 36 00:01:39,970 --> 00:01:42,430 Let's come over to the author tab 37 00:01:42,430 --> 00:01:45,643 and let's create a new data flow. 38 00:01:47,530 --> 00:01:50,450 I'll minimize this just to give us more space 39 00:01:50,450 --> 00:01:51,943 and let's add a source. 40 00:01:53,280 --> 00:01:57,423 For this demo, I'm going to pick the products table. 41 00:01:59,300 --> 00:02:04,300 Let's name this Products, just to keep everything straight, 42 00:02:05,060 --> 00:02:06,870 and start off with let's take a look 43 00:02:06,870 --> 00:02:09,060 at our conditional split. 44 00:02:09,060 --> 00:02:11,560 We have all kinds of products at Awesome Company, 45 00:02:11,560 --> 00:02:13,870 including ones of various colors. 46 00:02:13,870 --> 00:02:17,093 Let's say we wanted to pull out all of the red products. 47 00:02:17,940 --> 00:02:21,763 Let's name this stream RedProducts. 48 00:02:23,150 --> 00:02:28,150 And for our condition, let's say color = red. 49 00:02:30,520 --> 00:02:34,283 We can take that result set and send it over to a sink. 50 00:02:36,250 --> 00:02:38,853 Maybe we'll call that RedTable. 51 00:02:40,070 --> 00:02:43,610 So far that looks like a pretty standard data flow, 52 00:02:43,610 --> 00:02:45,200 just moving the data, 53 00:02:45,200 --> 00:02:48,530 but we can continue to add different splits here. 54 00:02:48,530 --> 00:02:52,803 Let's say that we want, maybe, our black products. 55 00:02:54,800 --> 00:02:57,883 Color = black. 56 00:03:01,180 --> 00:03:04,533 And maybe we want to group all the other products together. 57 00:03:06,920 --> 00:03:10,683 And we can have different sinks for each of those. 58 00:03:17,670 --> 00:03:19,033 Name that one BlackTable, 59 00:03:20,420 --> 00:03:25,403 and then add a sink for our other table. 60 00:03:29,730 --> 00:03:32,410 And this is the power of the conditional split. 61 00:03:32,410 --> 00:03:34,420 You're taking one stream of data 62 00:03:34,420 --> 00:03:37,320 and breaking it in to multiple streams, 63 00:03:37,320 --> 00:03:38,743 depending on the case. 64 00:03:39,610 --> 00:03:42,750 If the product is red, it will go to this table. 65 00:03:42,750 --> 00:03:44,410 Black will go to this table, 66 00:03:44,410 --> 00:03:47,473 and everything else will go to our catch-all table. 67 00:03:48,695 --> 00:03:51,130 And this is different from a new branch, 68 00:03:51,130 --> 00:03:52,563 which we can add here. 69 00:03:54,600 --> 00:03:57,770 If we wanted to use the same product source, 70 00:03:57,770 --> 00:03:59,920 but take it in a different direction, 71 00:03:59,920 --> 00:04:03,040 we can easily do that from this new branch. 72 00:04:03,040 --> 00:04:06,040 From here, we can do completely different things. 73 00:04:06,040 --> 00:04:08,260 For the moment, we'll stick with conditional split 74 00:04:08,260 --> 00:04:09,910 because that's kind of the theme. 75 00:04:11,370 --> 00:04:12,650 But what if we wanted to work 76 00:04:12,650 --> 00:04:14,250 on a completely different field? 77 00:04:15,480 --> 00:04:17,380 Let's bring this back up a little bit. 78 00:04:19,140 --> 00:04:22,623 Maybe we want to just pull out our more expensive products. 79 00:04:24,300 --> 00:04:26,100 We can open up expression builder, 80 00:04:26,100 --> 00:04:28,500 or since I already know the name of the column, 81 00:04:28,500 --> 00:04:33,103 we can just say standard cost greater than 1,000. 82 00:04:35,340 --> 00:04:39,913 And then maybe we'll name our other one, Affordable. 83 00:04:44,040 --> 00:04:46,810 Bring this back down so we can see everything. 84 00:04:46,810 --> 00:04:51,313 This will allow us to use a different sink, still. 85 00:04:54,220 --> 00:04:55,653 We have our Expensive table. 86 00:04:59,560 --> 00:05:04,560 And one more for our AffordableTable. 87 00:05:10,260 --> 00:05:11,890 All of this to illustrate 88 00:05:11,890 --> 00:05:15,020 we have the option of splitting out smaller streams 89 00:05:15,020 --> 00:05:16,760 based on certain cases, 90 00:05:16,760 --> 00:05:21,760 or taking an entire branch and taking it in a new direction. 91 00:05:22,090 --> 00:05:25,200 This allows us to still only have one data flow 92 00:05:25,200 --> 00:05:26,620 that we can call once, 93 00:05:26,620 --> 00:05:30,220 but fully utilize that product's source. 94 00:05:30,220 --> 00:05:32,800 And as mentioned, you don't have to add the new branch 95 00:05:32,800 --> 00:05:34,460 just at the beginning. 96 00:05:34,460 --> 00:05:37,150 You can add that from the transformation level, too. 97 00:05:37,150 --> 00:05:39,800 We could have a new branch off of red products 98 00:05:39,800 --> 00:05:42,280 to do further work with those products, 99 00:05:42,280 --> 00:05:45,153 taking that entire stream in a different direction. 100 00:05:48,320 --> 00:05:51,430 By way of review, the new branch transformation 101 00:05:51,430 --> 00:05:54,793 will allow you to split one source into multiple streams. 102 00:05:56,170 --> 00:05:58,040 The conditional split transformation 103 00:05:58,040 --> 00:06:01,830 will allow you to direct a subset of data into a new stream. 104 00:06:01,830 --> 00:06:04,710 And remember, that's much like your case statement 105 00:06:04,710 --> 00:06:08,513 or maybe a really complex if/then/else statement. 106 00:06:09,600 --> 00:06:12,070 And don't forget that these transformations are available 107 00:06:12,070 --> 00:06:16,143 in both Azure Data Factory and Azure Synapse Pipelines. 108 00:06:17,090 --> 00:06:18,140 That's it for now! 109 00:06:18,140 --> 00:06:19,810 I know you'll pick up on this right away 110 00:06:19,810 --> 00:06:22,070 and be a master of your data traffic, 111 00:06:22,070 --> 00:06:23,890 directing it in multiple streams, 112 00:06:23,890 --> 00:06:25,290 wherever you need it to go 113 00:06:25,290 --> 00:06:27,750 in order to accomplish as much as possible, 114 00:06:27,750 --> 00:06:29,203 as efficiently as possible. 115 00:06:30,130 --> 00:06:32,630 When you're ready, I'll see you in the next video.