1
00:00:00,000 --> 00:00:02,740
Hey, Cloud Gurus. Welcome back.

2
00:00:02,740 --> 00:00:05,590
In this lesson, we'll be talking about partitioning data.

3
00:00:07,020 --> 00:00:09,600
During the course of the lesson, we'll go over

4
00:00:09,600 --> 00:00:11,430
why partitioning matters,

5
00:00:11,430 --> 00:00:14,260
as well as a couple of different strategies,

6
00:00:14,260 --> 00:00:17,743
vertical partitioning, and functional partitioning.

7
00:00:18,620 --> 00:00:20,900
This will be mostly an introduction of partitioning

8
00:00:20,900 --> 00:00:23,920
concepts, helping you get familiar with the strategies.

9
00:00:23,920 --> 00:00:26,130
And then in the next couple of lessons,

10
00:00:26,130 --> 00:00:29,240
we'll get into some best practices for partitioning,

11
00:00:29,240 --> 00:00:31,723
but we'll wrap this lesson up with a review.

12
00:00:34,290 --> 00:00:37,690
To start out, why does partitioning matter?

13
00:00:37,690 --> 00:00:39,640
Well, here are 5 reasons.

14
00:00:39,640 --> 00:00:42,350
One, it improves scalability.

15
00:00:42,350 --> 00:00:44,500
As our data grows, we can split it out

16
00:00:44,500 --> 00:00:48,300
into more usable chunks and still access it quickly.

17
00:00:48,300 --> 00:00:50,680
This is not to be confused with the same type of splitting

18
00:00:50,680 --> 00:00:53,240
out across multiple computers that you will encounter

19
00:00:53,240 --> 00:00:56,233
with sharding, but we'll get to that in a future lesson.

20
00:00:58,050 --> 00:01:01,900
It improves performance. Think of it like a filing cabinet.

21
00:01:01,900 --> 00:01:04,970
If you can pull 1 file and read the contents of it

22
00:01:04,970 --> 00:01:06,990
instead of the entire filing cabinet,

23
00:01:06,990 --> 00:01:08,823
it's going to go a lot faster.

24
00:01:09,980 --> 00:01:11,660
We improve security.

25
00:01:11,660 --> 00:01:13,390
You have an opportunity to place data

26
00:01:13,390 --> 00:01:16,653
with different security needs on different partitions.

27
00:01:18,150 --> 00:01:19,790
You improve availability.

28
00:01:19,790 --> 00:01:23,133
Having the data spread out avoids single points of failure,

29
00:01:24,690 --> 00:01:27,850
and you can improve cost savings by taking advantage

30
00:01:27,850 --> 00:01:30,480
of opportunities to place lower priority data

31
00:01:30,480 --> 00:01:31,733
on cheaper storage.

32
00:01:34,630 --> 00:01:36,360
We have a couple of options when it comes

33
00:01:36,360 --> 00:01:38,340
to partitioning strategies.

34
00:01:38,340 --> 00:01:41,260
The first is vertical, where you distribute the data

35
00:01:41,260 --> 00:01:45,370
according to its pattern of use, and functional,

36
00:01:45,370 --> 00:01:47,560
where you aggregate data by purpose

37
00:01:47,560 --> 00:01:50,180
within a bounded context.

38
00:01:50,180 --> 00:01:52,163
Let's jump first into vertical.

39
00:01:53,540 --> 00:01:56,300
Here, the partitions hold a subset of the fields,

40
00:01:56,300 --> 00:01:59,670
and they're divided according to the pattern of use.

41
00:01:59,670 --> 00:02:01,700
If we take a look at this visually,

42
00:02:01,700 --> 00:02:04,130
let's say we have a table of students with a few different

43
00:02:04,130 --> 00:02:09,130
fields: ID, Name, Topic, and Hours Watched.

44
00:02:09,320 --> 00:02:13,030
We can take the most frequently used fields, such as ID,

45
00:02:13,030 --> 00:02:16,922
Name, and Topic, and put them on 1 partition,

46
00:02:16,922 --> 00:02:21,090
and then take the less frequently used Hours Watched column

47
00:02:21,090 --> 00:02:23,410
and put that on a different partition,

48
00:02:23,410 --> 00:02:25,683
still matched up by the ID.

49
00:02:28,630 --> 00:02:30,670
Think of this kind of like a chef

50
00:02:30,670 --> 00:02:34,300
who has to have the most common orders ready faster.

51
00:02:34,300 --> 00:02:36,780
This chef is going to keep on hand the ingredients

52
00:02:36,780 --> 00:02:40,280
for their most frequently requested meals.

53
00:02:40,280 --> 00:02:41,970
Maybe if their specialty is hamburgers,

54
00:02:41,970 --> 00:02:44,250
they've always got hamburgers ready to go,

55
00:02:44,250 --> 00:02:46,220
and then maybe just 1 or 2 hot dogs,

56
00:02:46,220 --> 00:02:48,460
because, I mean, really. Who wants a hot dog

57
00:02:48,460 --> 00:02:50,230
when you can have a hamburger?

58
00:02:50,230 --> 00:02:52,400
But they're prioritizing their most frequently

59
00:02:52,400 --> 00:02:55,563
requested meals over the lesser used ones.

60
00:02:56,590 --> 00:02:59,080
This strategy also allows you to separate static

61
00:02:59,080 --> 00:03:01,180
and dynamic data.

62
00:03:01,180 --> 00:03:04,285
Slow moving data can be cached in memory by the application,

63
00:03:04,285 --> 00:03:06,123
improving its performance.

64
00:03:07,370 --> 00:03:09,310
You can also add security,

65
00:03:09,310 --> 00:03:12,140
placing sensitive data on separate partitions

66
00:03:12,140 --> 00:03:13,963
with higher levels of security.

67
00:03:16,120 --> 00:03:18,740
Our second strategy is functional.

68
00:03:18,740 --> 00:03:21,360
And in this case, we're aggregating by use,

69
00:03:21,360 --> 00:03:25,630
finding bounded contexts to determine our partitions.

70
00:03:25,630 --> 00:03:27,420
Think of it much like offices.

71
00:03:27,420 --> 00:03:29,740
The sales team is going to be in 1 office

72
00:03:29,740 --> 00:03:32,020
and IT is in another because they're working on

73
00:03:32,020 --> 00:03:33,420
completely different things.

74
00:03:34,280 --> 00:03:35,940
If we take a look at this,

75
00:03:35,940 --> 00:03:38,170
let's say we have 2 sets of data.

76
00:03:38,170 --> 00:03:41,090
One is a list of students and the topics they're watching,

77
00:03:41,090 --> 00:03:45,110
much like we had before. The other is customer information,

78
00:03:45,110 --> 00:03:47,340
or the businesses that employ the students

79
00:03:47,340 --> 00:03:49,620
that are watching the courses.

80
00:03:49,620 --> 00:03:53,060
And so we'd have our student information on 1 partition

81
00:03:53,060 --> 00:03:56,470
and our customer information on a different partition,

82
00:03:56,470 --> 00:03:58,370
2 different bounded contexts

83
00:03:58,370 --> 00:04:00,143
placed on 2 different partitions.

84
00:04:02,590 --> 00:04:06,210
With this strategy, we can also separate by workload types,

85
00:04:06,210 --> 00:04:09,883
placing our read-only data in a separate read partition.

86
00:04:12,520 --> 00:04:16,540
By way of review, partitioning improves scalability,

87
00:04:16,540 --> 00:04:20,853
performance, security, availability, and cost savings.

88
00:04:22,040 --> 00:04:24,290
There are 2 strategies for partitioning,

89
00:04:24,290 --> 00:04:26,720
either vertical or functional.

90
00:04:26,720 --> 00:04:29,230
Both of these are methods of taking a single database

91
00:04:29,230 --> 00:04:30,940
instance and splitting out the data

92
00:04:30,940 --> 00:04:33,400
into multiple workable chunks.

93
00:04:33,400 --> 00:04:36,460
The difference is that one does it by the pattern of use,

94
00:04:36,460 --> 00:04:39,023
and the other does it by bounded contexts.

95
00:04:40,430 --> 00:04:42,430
And of course, partition decisions will depend

96
00:04:42,430 --> 00:04:45,730
on the type of data you have, how you want to use it,

97
00:04:45,730 --> 00:04:47,830
and what the workload will be.

98
00:04:47,830 --> 00:04:49,350
This is really what's going to decide

99
00:04:49,350 --> 00:04:51,200
which of these partition strategies

100
00:04:51,200 --> 00:04:52,700
is the most effective for you.

101
00:04:53,910 --> 00:04:56,780
Thank you for joining me in this video on partitioning.

102
00:04:56,780 --> 00:04:59,140
As I said before, in the next couple of videos,

103
00:04:59,140 --> 00:05:01,300
we'll dive into some specific strategies

104
00:05:01,300 --> 00:05:03,130
around partitioning both on data lakes

105
00:05:03,130 --> 00:05:05,280
and Azure Synapse analytics.

106
00:05:05,280 --> 00:05:07,480
When you're ready, I'll see you there gurus.