1 00:00:00,000 --> 00:00:02,740 Hey, Cloud Gurus. Welcome back. 2 00:00:02,740 --> 00:00:05,590 In this lesson, we'll be talking about partitioning data. 3 00:00:07,020 --> 00:00:09,600 During the course of the lesson, we'll go over 4 00:00:09,600 --> 00:00:11,430 why partitioning matters, 5 00:00:11,430 --> 00:00:14,260 as well as a couple of different strategies, 6 00:00:14,260 --> 00:00:17,743 vertical partitioning, and functional partitioning. 7 00:00:18,620 --> 00:00:20,900 This will be mostly an introduction of partitioning 8 00:00:20,900 --> 00:00:23,920 concepts, helping you get familiar with the strategies. 9 00:00:23,920 --> 00:00:26,130 And then in the next couple of lessons, 10 00:00:26,130 --> 00:00:29,240 we'll get into some best practices for partitioning, 11 00:00:29,240 --> 00:00:31,723 but we'll wrap this lesson up with a review. 12 00:00:34,290 --> 00:00:37,690 To start out, why does partitioning matter? 13 00:00:37,690 --> 00:00:39,640 Well, here are 5 reasons. 14 00:00:39,640 --> 00:00:42,350 One, it improves scalability. 15 00:00:42,350 --> 00:00:44,500 As our data grows, we can split it out 16 00:00:44,500 --> 00:00:48,300 into more usable chunks and still access it quickly. 17 00:00:48,300 --> 00:00:50,680 This is not to be confused with the same type of splitting 18 00:00:50,680 --> 00:00:53,240 out across multiple computers that you will encounter 19 00:00:53,240 --> 00:00:56,233 with sharding, but we'll get to that in a future lesson. 20 00:00:58,050 --> 00:01:01,900 It improves performance. Think of it like a filing cabinet. 21 00:01:01,900 --> 00:01:04,970 If you can pull 1 file and read the contents of it 22 00:01:04,970 --> 00:01:06,990 instead of the entire filing cabinet, 23 00:01:06,990 --> 00:01:08,823 it's going to go a lot faster. 24 00:01:09,980 --> 00:01:11,660 We improve security. 25 00:01:11,660 --> 00:01:13,390 You have an opportunity to place data 26 00:01:13,390 --> 00:01:16,653 with different security needs on different partitions. 27 00:01:18,150 --> 00:01:19,790 You improve availability. 28 00:01:19,790 --> 00:01:23,133 Having the data spread out avoids single points of failure, 29 00:01:24,690 --> 00:01:27,850 and you can improve cost savings by taking advantage 30 00:01:27,850 --> 00:01:30,480 of opportunities to place lower priority data 31 00:01:30,480 --> 00:01:31,733 on cheaper storage. 32 00:01:34,630 --> 00:01:36,360 We have a couple of options when it comes 33 00:01:36,360 --> 00:01:38,340 to partitioning strategies. 34 00:01:38,340 --> 00:01:41,260 The first is vertical, where you distribute the data 35 00:01:41,260 --> 00:01:45,370 according to its pattern of use, and functional, 36 00:01:45,370 --> 00:01:47,560 where you aggregate data by purpose 37 00:01:47,560 --> 00:01:50,180 within a bounded context. 38 00:01:50,180 --> 00:01:52,163 Let's jump first into vertical. 39 00:01:53,540 --> 00:01:56,300 Here, the partitions hold a subset of the fields, 40 00:01:56,300 --> 00:01:59,670 and they're divided according to the pattern of use. 41 00:01:59,670 --> 00:02:01,700 If we take a look at this visually, 42 00:02:01,700 --> 00:02:04,130 let's say we have a table of students with a few different 43 00:02:04,130 --> 00:02:09,130 fields: ID, Name, Topic, and Hours Watched. 44 00:02:09,320 --> 00:02:13,030 We can take the most frequently used fields, such as ID, 45 00:02:13,030 --> 00:02:16,922 Name, and Topic, and put them on 1 partition, 46 00:02:16,922 --> 00:02:21,090 and then take the less frequently used Hours Watched column 47 00:02:21,090 --> 00:02:23,410 and put that on a different partition, 48 00:02:23,410 --> 00:02:25,683 still matched up by the ID. 49 00:02:28,630 --> 00:02:30,670 Think of this kind of like a chef 50 00:02:30,670 --> 00:02:34,300 who has to have the most common orders ready faster. 51 00:02:34,300 --> 00:02:36,780 This chef is going to keep on hand the ingredients 52 00:02:36,780 --> 00:02:40,280 for their most frequently requested meals. 53 00:02:40,280 --> 00:02:41,970 Maybe if their specialty is hamburgers, 54 00:02:41,970 --> 00:02:44,250 they've always got hamburgers ready to go, 55 00:02:44,250 --> 00:02:46,220 and then maybe just 1 or 2 hot dogs, 56 00:02:46,220 --> 00:02:48,460 because, I mean, really. Who wants a hot dog 57 00:02:48,460 --> 00:02:50,230 when you can have a hamburger? 58 00:02:50,230 --> 00:02:52,400 But they're prioritizing their most frequently 59 00:02:52,400 --> 00:02:55,563 requested meals over the lesser used ones. 60 00:02:56,590 --> 00:02:59,080 This strategy also allows you to separate static 61 00:02:59,080 --> 00:03:01,180 and dynamic data. 62 00:03:01,180 --> 00:03:04,285 Slow moving data can be cached in memory by the application, 63 00:03:04,285 --> 00:03:06,123 improving its performance. 64 00:03:07,370 --> 00:03:09,310 You can also add security, 65 00:03:09,310 --> 00:03:12,140 placing sensitive data on separate partitions 66 00:03:12,140 --> 00:03:13,963 with higher levels of security. 67 00:03:16,120 --> 00:03:18,740 Our second strategy is functional. 68 00:03:18,740 --> 00:03:21,360 And in this case, we're aggregating by use, 69 00:03:21,360 --> 00:03:25,630 finding bounded contexts to determine our partitions. 70 00:03:25,630 --> 00:03:27,420 Think of it much like offices. 71 00:03:27,420 --> 00:03:29,740 The sales team is going to be in 1 office 72 00:03:29,740 --> 00:03:32,020 and IT is in another because they're working on 73 00:03:32,020 --> 00:03:33,420 completely different things. 74 00:03:34,280 --> 00:03:35,940 If we take a look at this, 75 00:03:35,940 --> 00:03:38,170 let's say we have 2 sets of data. 76 00:03:38,170 --> 00:03:41,090 One is a list of students and the topics they're watching, 77 00:03:41,090 --> 00:03:45,110 much like we had before. The other is customer information, 78 00:03:45,110 --> 00:03:47,340 or the businesses that employ the students 79 00:03:47,340 --> 00:03:49,620 that are watching the courses. 80 00:03:49,620 --> 00:03:53,060 And so we'd have our student information on 1 partition 81 00:03:53,060 --> 00:03:56,470 and our customer information on a different partition, 82 00:03:56,470 --> 00:03:58,370 2 different bounded contexts 83 00:03:58,370 --> 00:04:00,143 placed on 2 different partitions. 84 00:04:02,590 --> 00:04:06,210 With this strategy, we can also separate by workload types, 85 00:04:06,210 --> 00:04:09,883 placing our read-only data in a separate read partition. 86 00:04:12,520 --> 00:04:16,540 By way of review, partitioning improves scalability, 87 00:04:16,540 --> 00:04:20,853 performance, security, availability, and cost savings. 88 00:04:22,040 --> 00:04:24,290 There are 2 strategies for partitioning, 89 00:04:24,290 --> 00:04:26,720 either vertical or functional. 90 00:04:26,720 --> 00:04:29,230 Both of these are methods of taking a single database 91 00:04:29,230 --> 00:04:30,940 instance and splitting out the data 92 00:04:30,940 --> 00:04:33,400 into multiple workable chunks. 93 00:04:33,400 --> 00:04:36,460 The difference is that one does it by the pattern of use, 94 00:04:36,460 --> 00:04:39,023 and the other does it by bounded contexts. 95 00:04:40,430 --> 00:04:42,430 And of course, partition decisions will depend 96 00:04:42,430 --> 00:04:45,730 on the type of data you have, how you want to use it, 97 00:04:45,730 --> 00:04:47,830 and what the workload will be. 98 00:04:47,830 --> 00:04:49,350 This is really what's going to decide 99 00:04:49,350 --> 00:04:51,200 which of these partition strategies 100 00:04:51,200 --> 00:04:52,700 is the most effective for you. 101 00:04:53,910 --> 00:04:56,780 Thank you for joining me in this video on partitioning. 102 00:04:56,780 --> 00:04:59,140 As I said before, in the next couple of videos, 103 00:04:59,140 --> 00:05:01,300 we'll dive into some specific strategies 104 00:05:01,300 --> 00:05:03,130 around partitioning both on data lakes 105 00:05:03,130 --> 00:05:05,280 and Azure Synapse analytics. 106 00:05:05,280 --> 00:05:07,480 When you're ready, I'll see you there gurus.