1
00:00:00,180 --> 00:00:02,020
‫Now let's have a look at DynamoDB

2
00:00:02,020 --> 00:00:04,540
‫which is a NoSQL serverless database.

3
00:00:04,540 --> 00:00:06,290
‫So if you consider the traditional architecture

4
00:00:06,290 --> 00:00:07,730
‫that we've seen in and out in this course,

5
00:00:07,730 --> 00:00:10,880
‫we have clients and they connect to an application layer

6
00:00:10,880 --> 00:00:13,010
‫that could be made of an elastic load balancer,

7
00:00:13,010 --> 00:00:15,120
‫and EC2 instances that are grouped,

8
00:00:15,120 --> 00:00:17,090
‫and scaling with an auto scaling group.

9
00:00:17,090 --> 00:00:18,870
‫And then the data has to be sourced somewhere,

10
00:00:18,870 --> 00:00:20,380
‫so we have a database layer,

11
00:00:20,380 --> 00:00:22,550
‫and it could be using Amazon RDS,

12
00:00:22,550 --> 00:00:25,090
‫which is backed by MySQL, or PostgreSQL,

13
00:00:25,090 --> 00:00:27,100
‫or these kind of technologies.

14
00:00:27,100 --> 00:00:28,530
‫Now these traditional applications

15
00:00:28,530 --> 00:00:30,880
‫will leverage RDBMS databases,

16
00:00:30,880 --> 00:00:33,450
‫and we do so because we have this SQL query language.

17
00:00:33,450 --> 00:00:34,283
‫It's really good.

18
00:00:34,283 --> 00:00:36,390
‫And then we can define strong requirements

19
00:00:36,390 --> 00:00:37,610
‫about how data should be modeled

20
00:00:37,610 --> 00:00:39,800
‫because we have tables, we have schema, and so on.

21
00:00:39,800 --> 00:00:43,080
‫We can do joints, aggregations, complex computations,

22
00:00:43,080 --> 00:00:45,300
‫and that's all very good and working.

23
00:00:45,300 --> 00:00:46,350
‫What we get out of it though

24
00:00:46,350 --> 00:00:48,790
‫in case of scaling is mostly vertical scaling.

25
00:00:48,790 --> 00:00:50,740
‫So in case you want a better database,

26
00:00:50,740 --> 00:00:52,740
‫and I'm talking just about a database layer right now.

27
00:00:52,740 --> 00:00:54,470
‫If you want to scale it vertically,

28
00:00:54,470 --> 00:00:56,120
‫you need to replace database

29
00:00:56,120 --> 00:00:58,660
‫and get a more powerful CPU, more Ram,

30
00:00:58,660 --> 00:01:00,280
‫or better disc with IO.

31
00:01:00,280 --> 00:01:02,710
‫And you can do some sort of our horizontal scaling

32
00:01:02,710 --> 00:01:05,260
‫but this is by only increasing the read capability.

33
00:01:05,260 --> 00:01:07,390
‫So either by adding EC2 instances

34
00:01:07,390 --> 00:01:08,870
‫at the application layer,

35
00:01:08,870 --> 00:01:11,650
‫or by adding RDS Read Replicas at the database layers.

36
00:01:11,650 --> 00:01:13,450
‫But if you add Read Replicas,

37
00:01:13,450 --> 00:01:14,420
‫you're going to be limited

38
00:01:14,420 --> 00:01:16,280
‫by the number of replicas you can have,

39
00:01:16,280 --> 00:01:19,550
‫and therefore limited into your horizontal read scaling,

40
00:01:19,550 --> 00:01:21,600
‫and not talking about the horizontal right scaling

41
00:01:21,600 --> 00:01:23,800
‫because you don't have that with RDS.

42
00:01:23,800 --> 00:01:26,230
‫So introducing to you NoSQL databases,

43
00:01:26,230 --> 00:01:28,790
‫which means not only SQL or non SQL databases,

44
00:01:28,790 --> 00:01:30,560
‫depending on the definition.

45
00:01:30,560 --> 00:01:32,610
‫So the idea is that there are non-relational databases

46
00:01:32,610 --> 00:01:33,840
‫and they're going to be distributed,

47
00:01:33,840 --> 00:01:36,210
‫which gives us some horizontal scalability.

48
00:01:36,210 --> 00:01:37,690
‫And some very famous technologies

49
00:01:37,690 --> 00:01:40,700
‫that have NoSQL databases are MongoDB,

50
00:01:40,700 --> 00:01:42,800
‫and of course, DynamoDB.

51
00:01:42,800 --> 00:01:45,300
‫Now these databases do not support query joints

52
00:01:45,300 --> 00:01:46,510
‫or have very limited support.

53
00:01:46,510 --> 00:01:47,761
‫And so, for simplicity,

54
00:01:47,761 --> 00:01:50,610
‫just assume that they don't have query joins.

55
00:01:50,610 --> 00:01:52,040
‫Now all the data that is needed

56
00:01:52,040 --> 00:01:55,430
‫therefore must be present in one row in your database.

57
00:01:55,430 --> 00:01:58,720
‫And again, to simplify things, I know they're evolving,

58
00:01:58,720 --> 00:02:00,610
‫but let's just assume that NoSQL databases

59
00:02:00,610 --> 00:02:03,080
‫also do not perform aggregation computations

60
00:02:03,080 --> 00:02:05,630
‫such as the SUM, or the AVG, and so on.

61
00:02:05,630 --> 00:02:08,580
‫But the good thing out of it is that thanks to the design,

62
00:02:08,580 --> 00:02:11,180
‫the NoSQL databases will scale horizontally.

63
00:02:11,180 --> 00:02:13,940
‫That means that if you need more right or read capacity,

64
00:02:13,940 --> 00:02:17,390
‫you can behind the scenes have more instances

65
00:02:17,390 --> 00:02:19,210
‫and it will scale really well.

66
00:02:19,210 --> 00:02:21,650
‫So there is no right or wrong for NoSQL vs SQL,

67
00:02:21,650 --> 00:02:23,430
‫it just depends on how you think

68
00:02:23,430 --> 00:02:26,220
‫about your modeling of data, about your application,

69
00:02:26,220 --> 00:02:29,230
‫about your user queries, and about your scaling needs.

70
00:02:29,230 --> 00:02:30,930
‫So let's talk about DynamoDB now.

71
00:02:30,930 --> 00:02:33,950
‫So DynamoDB is a fully managed NoSQL database,

72
00:02:33,950 --> 00:02:34,960
‫and it's highly available,

73
00:02:34,960 --> 00:02:37,850
‫and has replications across multiple AZs out of the box.

74
00:02:37,850 --> 00:02:38,890
‫So it's a NoSQL database.

75
00:02:38,890 --> 00:02:40,210
‫It's not a relational database.

76
00:02:40,210 --> 00:02:43,140
‫So it's different than RDS.

77
00:02:43,140 --> 00:02:46,390
‫It scales to massive workload and it's fully distributed.

78
00:02:46,390 --> 00:02:47,223
‫That means that you can scale

79
00:02:47,223 --> 00:02:49,090
‫to millions of requests per second,

80
00:02:49,090 --> 00:02:51,920
‫trillions of row, and hundreds of terabytes of storage,

81
00:02:51,920 --> 00:02:54,240
‫regardless of your workload.

82
00:02:54,240 --> 00:02:55,610
‫So fast and consistent performance.

83
00:02:55,610 --> 00:02:58,330
‫So that means you get really low latency on retrieval.

84
00:02:58,330 --> 00:02:59,900
‫And it's a service,

85
00:02:59,900 --> 00:03:02,840
‫so it's going to be fully integrated with IAM for security,

86
00:03:02,840 --> 00:03:05,010
‫authorization, and administration.

87
00:03:05,010 --> 00:03:06,784
‫You can enable event-driven programming

88
00:03:06,784 --> 00:03:09,400
‫with DynamoDB Streams as we'll see in this section.

89
00:03:09,400 --> 00:03:12,260
‫It's low cost and has auto scaling capability.

90
00:03:12,260 --> 00:03:13,280
‫And you have the standard

91
00:03:13,280 --> 00:03:15,530
‫and Infrequent Access IA table class

92
00:03:15,530 --> 00:03:17,390
‫for different storage tiers.

93
00:03:17,390 --> 00:03:19,930
‫So let's have a look at the basics of DynamoDB.

94
00:03:19,930 --> 00:03:22,500
‫So DynamoDB is made out of tables

95
00:03:22,500 --> 00:03:24,330
‫and each table will have a primary key.

96
00:03:24,330 --> 00:03:26,290
‫And we'll see what the primary key can be

97
00:03:26,290 --> 00:03:27,420
‫in the next slides.

98
00:03:27,420 --> 00:03:29,040
‫You must decide what the primary key is

99
00:03:29,040 --> 00:03:31,200
‫before you create your table.

100
00:03:31,200 --> 00:03:33,790
‫Now, each table can have an infinite number of rows,

101
00:03:33,790 --> 00:03:34,700
‫also called items.

102
00:03:34,700 --> 00:03:37,630
‫So I will use a rows and items term interchangeably

103
00:03:37,630 --> 00:03:38,640
‫in this course,

104
00:03:38,640 --> 00:03:40,770
‫and each item will have attributes.

105
00:03:40,770 --> 00:03:42,330
‫Now these attributes could be similar

106
00:03:42,330 --> 00:03:44,340
‫to the columns in your table, okay?

107
00:03:44,340 --> 00:03:45,850
‫But these attributes can also be nested.

108
00:03:45,850 --> 00:03:47,830
‫So it's a bit more powerful than columns,

109
00:03:47,830 --> 00:03:48,930
‫and they can be added over time,

110
00:03:48,930 --> 00:03:50,090
‫you don't need to define them all

111
00:03:50,090 --> 00:03:51,850
‫at creation time of your table,

112
00:03:51,850 --> 00:03:53,400
‫and some of them can be null.

113
00:03:53,400 --> 00:03:55,070
‫so it's completely fine for an attribute

114
00:03:55,070 --> 00:03:57,240
‫to be missing in some data.

115
00:03:57,240 --> 00:03:58,540
‫Now, and each item,

116
00:03:58,540 --> 00:04:01,450
‫or each row will have up to 400 kilobytes of data,

117
00:04:01,450 --> 00:04:02,970
‫so it's a limitation.

118
00:04:02,970 --> 00:04:05,730
‫And data types supported are going to be scalar types,

119
00:04:05,730 --> 00:04:08,700
‫or string, number, binary, boolean, and null.

120
00:04:08,700 --> 00:04:10,840
‫There's going to be document types such as list and maps,

121
00:04:10,840 --> 00:04:13,240
‫so it gives you some kind of nesting capability.

122
00:04:13,240 --> 00:04:15,520
‫And sets types such as string sets,

123
00:04:15,520 --> 00:04:17,890
‫number sets, and binary sets.

124
00:04:17,890 --> 00:04:19,930
‫Now a very important point to understand

125
00:04:19,930 --> 00:04:22,750
‫is how to choose a primary key for DynamoDB.

126
00:04:22,750 --> 00:04:24,190
‫And the exam will definitely test you

127
00:04:24,190 --> 00:04:26,070
‫on the knowledge of this.

128
00:04:26,070 --> 00:04:27,690
‫So you have two options for primary keys.

129
00:04:27,690 --> 00:04:29,940
‫And the first one is called a partition key

130
00:04:29,940 --> 00:04:31,890
‫which called also a hash strategy.

131
00:04:31,890 --> 00:04:33,860
‫So the partition key in this case must be unique

132
00:04:33,860 --> 00:04:36,720
‫for each item which is very similar to a normal database.

133
00:04:36,720 --> 00:04:37,553
‫And therefore,

134
00:04:37,553 --> 00:04:39,310
‫the partition must be diverse enough,

135
00:04:39,310 --> 00:04:41,730
‫so that your data is going to be distributed.

136
00:04:41,730 --> 00:04:42,563
‫For example,

137
00:04:42,563 --> 00:04:45,020
‫if you consider user_ID for user's table,

138
00:04:45,020 --> 00:04:47,267
‫then we can have the partition key as being User_ID.

139
00:04:47,267 --> 00:04:50,170
‫The attribute is being First-Name, Last_Name, and age.

140
00:04:50,170 --> 00:04:52,250
‫And then you have your first User-ID

141
00:04:52,250 --> 00:04:54,260
‫with some attributes being filled out.

142
00:04:54,260 --> 00:04:55,890
‫Your second User_ID as you can see,

143
00:04:55,890 --> 00:04:56,990
‫does not have a last name,

144
00:04:56,990 --> 00:04:59,080
‫but this is fine in DynamoDB.

145
00:04:59,080 --> 00:05:00,780
‫And the third partition key,

146
00:05:00,780 --> 00:05:04,650
‫yet to again has three attributes attached to it.

147
00:05:04,650 --> 00:05:06,070
‫So this is what DynamoDB looks like.

148
00:05:06,070 --> 00:05:07,560
‫It looks like a database for now.

149
00:05:07,560 --> 00:05:09,040
‫It's pretty easy.

150
00:05:09,040 --> 00:05:11,240
‫But you could have a second option

151
00:05:11,240 --> 00:05:13,735
‫of partition key and sort key,

152
00:05:13,735 --> 00:05:16,040
‫also called hash plus range.

153
00:05:16,040 --> 00:05:18,950
‫And now the combination of these two items

154
00:05:18,950 --> 00:05:20,350
‫must be unique for each item.

155
00:05:20,350 --> 00:05:22,757
‫So the data is going to be grouped by partition key.

156
00:05:22,757 --> 00:05:23,850
‫And this is why it's very important

157
00:05:23,850 --> 00:05:25,320
‫to choose a good partition key.

158
00:05:25,320 --> 00:05:28,070
‫So if you consider a users-game table,

159
00:05:28,070 --> 00:05:29,937
‫then User_ID for the partition key

160
00:05:29,937 --> 00:05:31,400
‫and Game_ID for the sort key.

161
00:05:31,400 --> 00:05:32,420
‫Let's see what that means.

162
00:05:32,420 --> 00:05:35,050
‫That means that users can attend multiple games.

163
00:05:35,050 --> 00:05:38,860
‫So we have these four columns attributes,

164
00:05:38,860 --> 00:05:40,960
‫but the first one is going to be our partition key.

165
00:05:40,960 --> 00:05:42,420
‫We want data to be grouped by User_ID,

166
00:05:42,420 --> 00:05:44,750
‫and the second one is going to be a sort key.

167
00:05:44,750 --> 00:05:46,500
‫This is going to give us the uniqueness

168
00:05:46,500 --> 00:05:49,880
‫of the combination of partition key and sort key.

169
00:05:49,880 --> 00:05:52,520
‫So both of them are going to be making the primary key

170
00:05:52,520 --> 00:05:54,640
‫and the rest are going to be attributes.

171
00:05:54,640 --> 00:05:56,730
‫So if you consider a User_ID,

172
00:05:56,730 --> 00:05:59,400
‫then it has a sort key which was on the Game_ID,

173
00:05:59,400 --> 00:06:02,520
‫and then we attribute score 92, result win.

174
00:06:02,520 --> 00:06:04,500
‫Again, another different User_ID,

175
00:06:04,500 --> 00:06:05,790
‫and another different Game_ID,

176
00:06:05,790 --> 00:06:06,710
‫so this works as well.

177
00:06:06,710 --> 00:06:09,180
‫You have a lose game with a score of 14.

178
00:06:09,180 --> 00:06:11,870
‫And more interestingly, in this third row,

179
00:06:11,870 --> 00:06:14,200
‫what we have is that we have the same partition key.

180
00:06:14,200 --> 00:06:17,600
‫So row two and three have the same and partition key

181
00:06:17,600 --> 00:06:19,200
‫but a different sort key.

182
00:06:19,200 --> 00:06:22,310
‫Of course, it is fine for a user to attend multiple games.

183
00:06:22,310 --> 00:06:23,143
‫And so therefore,

184
00:06:23,143 --> 00:06:24,037
‫you want the combination of User_ID

185
00:06:24,037 --> 00:06:26,110
‫and Game_ID to be unique, obviously,

186
00:06:26,110 --> 00:06:28,430
‫but it's fine to also have the same partition key

187
00:06:28,430 --> 00:06:30,140
‫in different sort key.

188
00:06:30,140 --> 00:06:31,800
‫And this is why it's super important

189
00:06:31,800 --> 00:06:34,040
‫to choose a really good partition key,

190
00:06:34,040 --> 00:06:36,840
‫so data is going to be distributed enough.

191
00:06:36,840 --> 00:06:38,200
‫So here is an exercise,

192
00:06:38,200 --> 00:06:40,360
‫and this is what the exam may test you on as well.

193
00:06:40,360 --> 00:06:42,600
‫So you're building a movie database,

194
00:06:42,600 --> 00:06:44,990
‫and you want to choose what is the best partition key

195
00:06:44,990 --> 00:06:47,890
‫that is going to maximize data distribution.

196
00:06:47,890 --> 00:06:49,230
‫Is it movie_id?

197
00:06:49,230 --> 00:06:50,410
‫Is it producer_name?

198
00:06:50,410 --> 00:06:54,430
‫Is it leader_actor_name or is it movie_language?

199
00:06:54,430 --> 00:06:56,420
‫Well, think about it for a second,

200
00:06:56,420 --> 00:06:57,253
‫what if you choose the first one,

201
00:06:57,253 --> 00:06:58,923
‫you choose the second one, and so on?

202
00:06:59,900 --> 00:07:02,570
‫Now the answer is you choose movie_id,

203
00:07:02,570 --> 00:07:05,960
‫because movie_id is going to be unique for each row.

204
00:07:05,960 --> 00:07:07,830
‫And therefore it's a really good candidates

205
00:07:07,830 --> 00:07:10,490
‫to partition your table by.

206
00:07:10,490 --> 00:07:15,490
‫If you have a movie language as a, sorry, a partition key,

207
00:07:15,880 --> 00:07:19,190
‫then you won't have as many values as you want,

208
00:07:19,190 --> 00:07:20,460
‫and maybe most of your movies

209
00:07:20,460 --> 00:07:22,060
‫are going to be towards English.

210
00:07:22,060 --> 00:07:23,340
‫So it's not gonna be a great choice

211
00:07:23,340 --> 00:07:25,090
‫because there's not enough diversity,

212
00:07:25,090 --> 00:07:28,680
‫and there's a skew of data towards one specific value.

213
00:07:28,680 --> 00:07:30,510
‫And so the exam will ask you to choose

214
00:07:30,510 --> 00:07:32,460
‫the best partition key for some tables

215
00:07:32,460 --> 00:07:33,380
‫based on what it means.

216
00:07:33,380 --> 00:07:36,210
‫So always choose the one with the highest cardinality

217
00:07:36,210 --> 00:07:38,800
‫and the one that can take on the most amount of values.

218
00:07:38,800 --> 00:07:41,040
‫So that's it for a short overview of DynamoDB.

219
00:07:41,040 --> 00:07:42,570
‫We have a long section on it,

220
00:07:42,570 --> 00:07:44,540
‫but let's go over the hands-on

221
00:07:44,540 --> 00:07:46,540
‫to practice a little bit using DynamoDB.