1 00:00:00,180 --> 00:00:02,020 ‫Now let's have a look at DynamoDB 2 00:00:02,020 --> 00:00:04,540 ‫which is a NoSQL serverless database. 3 00:00:04,540 --> 00:00:06,290 ‫So if you consider the traditional architecture 4 00:00:06,290 --> 00:00:07,730 ‫that we've seen in and out in this course, 5 00:00:07,730 --> 00:00:10,880 ‫we have clients and they connect to an application layer 6 00:00:10,880 --> 00:00:13,010 ‫that could be made of an elastic load balancer, 7 00:00:13,010 --> 00:00:15,120 ‫and EC2 instances that are grouped, 8 00:00:15,120 --> 00:00:17,090 ‫and scaling with an auto scaling group. 9 00:00:17,090 --> 00:00:18,870 ‫And then the data has to be sourced somewhere, 10 00:00:18,870 --> 00:00:20,380 ‫so we have a database layer, 11 00:00:20,380 --> 00:00:22,550 ‫and it could be using Amazon RDS, 12 00:00:22,550 --> 00:00:25,090 ‫which is backed by MySQL, or PostgreSQL, 13 00:00:25,090 --> 00:00:27,100 ‫or these kind of technologies. 14 00:00:27,100 --> 00:00:28,530 ‫Now these traditional applications 15 00:00:28,530 --> 00:00:30,880 ‫will leverage RDBMS databases, 16 00:00:30,880 --> 00:00:33,450 ‫and we do so because we have this SQL query language. 17 00:00:33,450 --> 00:00:34,283 ‫It's really good. 18 00:00:34,283 --> 00:00:36,390 ‫And then we can define strong requirements 19 00:00:36,390 --> 00:00:37,610 ‫about how data should be modeled 20 00:00:37,610 --> 00:00:39,800 ‫because we have tables, we have schema, and so on. 21 00:00:39,800 --> 00:00:43,080 ‫We can do joints, aggregations, complex computations, 22 00:00:43,080 --> 00:00:45,300 ‫and that's all very good and working. 23 00:00:45,300 --> 00:00:46,350 ‫What we get out of it though 24 00:00:46,350 --> 00:00:48,790 ‫in case of scaling is mostly vertical scaling. 25 00:00:48,790 --> 00:00:50,740 ‫So in case you want a better database, 26 00:00:50,740 --> 00:00:52,740 ‫and I'm talking just about a database layer right now. 27 00:00:52,740 --> 00:00:54,470 ‫If you want to scale it vertically, 28 00:00:54,470 --> 00:00:56,120 ‫you need to replace database 29 00:00:56,120 --> 00:00:58,660 ‫and get a more powerful CPU, more Ram, 30 00:00:58,660 --> 00:01:00,280 ‫or better disc with IO. 31 00:01:00,280 --> 00:01:02,710 ‫And you can do some sort of our horizontal scaling 32 00:01:02,710 --> 00:01:05,260 ‫but this is by only increasing the read capability. 33 00:01:05,260 --> 00:01:07,390 ‫So either by adding EC2 instances 34 00:01:07,390 --> 00:01:08,870 ‫at the application layer, 35 00:01:08,870 --> 00:01:11,650 ‫or by adding RDS Read Replicas at the database layers. 36 00:01:11,650 --> 00:01:13,450 ‫But if you add Read Replicas, 37 00:01:13,450 --> 00:01:14,420 ‫you're going to be limited 38 00:01:14,420 --> 00:01:16,280 ‫by the number of replicas you can have, 39 00:01:16,280 --> 00:01:19,550 ‫and therefore limited into your horizontal read scaling, 40 00:01:19,550 --> 00:01:21,600 ‫and not talking about the horizontal right scaling 41 00:01:21,600 --> 00:01:23,800 ‫because you don't have that with RDS. 42 00:01:23,800 --> 00:01:26,230 ‫So introducing to you NoSQL databases, 43 00:01:26,230 --> 00:01:28,790 ‫which means not only SQL or non SQL databases, 44 00:01:28,790 --> 00:01:30,560 ‫depending on the definition. 45 00:01:30,560 --> 00:01:32,610 ‫So the idea is that there are non-relational databases 46 00:01:32,610 --> 00:01:33,840 ‫and they're going to be distributed, 47 00:01:33,840 --> 00:01:36,210 ‫which gives us some horizontal scalability. 48 00:01:36,210 --> 00:01:37,690 ‫And some very famous technologies 49 00:01:37,690 --> 00:01:40,700 ‫that have NoSQL databases are MongoDB, 50 00:01:40,700 --> 00:01:42,800 ‫and of course, DynamoDB. 51 00:01:42,800 --> 00:01:45,300 ‫Now these databases do not support query joints 52 00:01:45,300 --> 00:01:46,510 ‫or have very limited support. 53 00:01:46,510 --> 00:01:47,761 ‫And so, for simplicity, 54 00:01:47,761 --> 00:01:50,610 ‫just assume that they don't have query joins. 55 00:01:50,610 --> 00:01:52,040 ‫Now all the data that is needed 56 00:01:52,040 --> 00:01:55,430 ‫therefore must be present in one row in your database. 57 00:01:55,430 --> 00:01:58,720 ‫And again, to simplify things, I know they're evolving, 58 00:01:58,720 --> 00:02:00,610 ‫but let's just assume that NoSQL databases 59 00:02:00,610 --> 00:02:03,080 ‫also do not perform aggregation computations 60 00:02:03,080 --> 00:02:05,630 ‫such as the SUM, or the AVG, and so on. 61 00:02:05,630 --> 00:02:08,580 ‫But the good thing out of it is that thanks to the design, 62 00:02:08,580 --> 00:02:11,180 ‫the NoSQL databases will scale horizontally. 63 00:02:11,180 --> 00:02:13,940 ‫That means that if you need more right or read capacity, 64 00:02:13,940 --> 00:02:17,390 ‫you can behind the scenes have more instances 65 00:02:17,390 --> 00:02:19,210 ‫and it will scale really well. 66 00:02:19,210 --> 00:02:21,650 ‫So there is no right or wrong for NoSQL vs SQL, 67 00:02:21,650 --> 00:02:23,430 ‫it just depends on how you think 68 00:02:23,430 --> 00:02:26,220 ‫about your modeling of data, about your application, 69 00:02:26,220 --> 00:02:29,230 ‫about your user queries, and about your scaling needs. 70 00:02:29,230 --> 00:02:30,930 ‫So let's talk about DynamoDB now. 71 00:02:30,930 --> 00:02:33,950 ‫So DynamoDB is a fully managed NoSQL database, 72 00:02:33,950 --> 00:02:34,960 ‫and it's highly available, 73 00:02:34,960 --> 00:02:37,850 ‫and has replications across multiple AZs out of the box. 74 00:02:37,850 --> 00:02:38,890 ‫So it's a NoSQL database. 75 00:02:38,890 --> 00:02:40,210 ‫It's not a relational database. 76 00:02:40,210 --> 00:02:43,140 ‫So it's different than RDS. 77 00:02:43,140 --> 00:02:46,390 ‫It scales to massive workload and it's fully distributed. 78 00:02:46,390 --> 00:02:47,223 ‫That means that you can scale 79 00:02:47,223 --> 00:02:49,090 ‫to millions of requests per second, 80 00:02:49,090 --> 00:02:51,920 ‫trillions of row, and hundreds of terabytes of storage, 81 00:02:51,920 --> 00:02:54,240 ‫regardless of your workload. 82 00:02:54,240 --> 00:02:55,610 ‫So fast and consistent performance. 83 00:02:55,610 --> 00:02:58,330 ‫So that means you get really low latency on retrieval. 84 00:02:58,330 --> 00:02:59,900 ‫And it's a service, 85 00:02:59,900 --> 00:03:02,840 ‫so it's going to be fully integrated with IAM for security, 86 00:03:02,840 --> 00:03:05,010 ‫authorization, and administration. 87 00:03:05,010 --> 00:03:06,784 ‫You can enable event-driven programming 88 00:03:06,784 --> 00:03:09,400 ‫with DynamoDB Streams as we'll see in this section. 89 00:03:09,400 --> 00:03:12,260 ‫It's low cost and has auto scaling capability. 90 00:03:12,260 --> 00:03:13,280 ‫And you have the standard 91 00:03:13,280 --> 00:03:15,530 ‫and Infrequent Access IA table class 92 00:03:15,530 --> 00:03:17,390 ‫for different storage tiers. 93 00:03:17,390 --> 00:03:19,930 ‫So let's have a look at the basics of DynamoDB. 94 00:03:19,930 --> 00:03:22,500 ‫So DynamoDB is made out of tables 95 00:03:22,500 --> 00:03:24,330 ‫and each table will have a primary key. 96 00:03:24,330 --> 00:03:26,290 ‫And we'll see what the primary key can be 97 00:03:26,290 --> 00:03:27,420 ‫in the next slides. 98 00:03:27,420 --> 00:03:29,040 ‫You must decide what the primary key is 99 00:03:29,040 --> 00:03:31,200 ‫before you create your table. 100 00:03:31,200 --> 00:03:33,790 ‫Now, each table can have an infinite number of rows, 101 00:03:33,790 --> 00:03:34,700 ‫also called items. 102 00:03:34,700 --> 00:03:37,630 ‫So I will use a rows and items term interchangeably 103 00:03:37,630 --> 00:03:38,640 ‫in this course, 104 00:03:38,640 --> 00:03:40,770 ‫and each item will have attributes. 105 00:03:40,770 --> 00:03:42,330 ‫Now these attributes could be similar 106 00:03:42,330 --> 00:03:44,340 ‫to the columns in your table, okay? 107 00:03:44,340 --> 00:03:45,850 ‫But these attributes can also be nested. 108 00:03:45,850 --> 00:03:47,830 ‫So it's a bit more powerful than columns, 109 00:03:47,830 --> 00:03:48,930 ‫and they can be added over time, 110 00:03:48,930 --> 00:03:50,090 ‫you don't need to define them all 111 00:03:50,090 --> 00:03:51,850 ‫at creation time of your table, 112 00:03:51,850 --> 00:03:53,400 ‫and some of them can be null. 113 00:03:53,400 --> 00:03:55,070 ‫so it's completely fine for an attribute 114 00:03:55,070 --> 00:03:57,240 ‫to be missing in some data. 115 00:03:57,240 --> 00:03:58,540 ‫Now, and each item, 116 00:03:58,540 --> 00:04:01,450 ‫or each row will have up to 400 kilobytes of data, 117 00:04:01,450 --> 00:04:02,970 ‫so it's a limitation. 118 00:04:02,970 --> 00:04:05,730 ‫And data types supported are going to be scalar types, 119 00:04:05,730 --> 00:04:08,700 ‫or string, number, binary, boolean, and null. 120 00:04:08,700 --> 00:04:10,840 ‫There's going to be document types such as list and maps, 121 00:04:10,840 --> 00:04:13,240 ‫so it gives you some kind of nesting capability. 122 00:04:13,240 --> 00:04:15,520 ‫And sets types such as string sets, 123 00:04:15,520 --> 00:04:17,890 ‫number sets, and binary sets. 124 00:04:17,890 --> 00:04:19,930 ‫Now a very important point to understand 125 00:04:19,930 --> 00:04:22,750 ‫is how to choose a primary key for DynamoDB. 126 00:04:22,750 --> 00:04:24,190 ‫And the exam will definitely test you 127 00:04:24,190 --> 00:04:26,070 ‫on the knowledge of this. 128 00:04:26,070 --> 00:04:27,690 ‫So you have two options for primary keys. 129 00:04:27,690 --> 00:04:29,940 ‫And the first one is called a partition key 130 00:04:29,940 --> 00:04:31,890 ‫which called also a hash strategy. 131 00:04:31,890 --> 00:04:33,860 ‫So the partition key in this case must be unique 132 00:04:33,860 --> 00:04:36,720 ‫for each item which is very similar to a normal database. 133 00:04:36,720 --> 00:04:37,553 ‫And therefore, 134 00:04:37,553 --> 00:04:39,310 ‫the partition must be diverse enough, 135 00:04:39,310 --> 00:04:41,730 ‫so that your data is going to be distributed. 136 00:04:41,730 --> 00:04:42,563 ‫For example, 137 00:04:42,563 --> 00:04:45,020 ‫if you consider user_ID for user's table, 138 00:04:45,020 --> 00:04:47,267 ‫then we can have the partition key as being User_ID. 139 00:04:47,267 --> 00:04:50,170 ‫The attribute is being First-Name, Last_Name, and age. 140 00:04:50,170 --> 00:04:52,250 ‫And then you have your first User-ID 141 00:04:52,250 --> 00:04:54,260 ‫with some attributes being filled out. 142 00:04:54,260 --> 00:04:55,890 ‫Your second User_ID as you can see, 143 00:04:55,890 --> 00:04:56,990 ‫does not have a last name, 144 00:04:56,990 --> 00:04:59,080 ‫but this is fine in DynamoDB. 145 00:04:59,080 --> 00:05:00,780 ‫And the third partition key, 146 00:05:00,780 --> 00:05:04,650 ‫yet to again has three attributes attached to it. 147 00:05:04,650 --> 00:05:06,070 ‫So this is what DynamoDB looks like. 148 00:05:06,070 --> 00:05:07,560 ‫It looks like a database for now. 149 00:05:07,560 --> 00:05:09,040 ‫It's pretty easy. 150 00:05:09,040 --> 00:05:11,240 ‫But you could have a second option 151 00:05:11,240 --> 00:05:13,735 ‫of partition key and sort key, 152 00:05:13,735 --> 00:05:16,040 ‫also called hash plus range. 153 00:05:16,040 --> 00:05:18,950 ‫And now the combination of these two items 154 00:05:18,950 --> 00:05:20,350 ‫must be unique for each item. 155 00:05:20,350 --> 00:05:22,757 ‫So the data is going to be grouped by partition key. 156 00:05:22,757 --> 00:05:23,850 ‫And this is why it's very important 157 00:05:23,850 --> 00:05:25,320 ‫to choose a good partition key. 158 00:05:25,320 --> 00:05:28,070 ‫So if you consider a users-game table, 159 00:05:28,070 --> 00:05:29,937 ‫then User_ID for the partition key 160 00:05:29,937 --> 00:05:31,400 ‫and Game_ID for the sort key. 161 00:05:31,400 --> 00:05:32,420 ‫Let's see what that means. 162 00:05:32,420 --> 00:05:35,050 ‫That means that users can attend multiple games. 163 00:05:35,050 --> 00:05:38,860 ‫So we have these four columns attributes, 164 00:05:38,860 --> 00:05:40,960 ‫but the first one is going to be our partition key. 165 00:05:40,960 --> 00:05:42,420 ‫We want data to be grouped by User_ID, 166 00:05:42,420 --> 00:05:44,750 ‫and the second one is going to be a sort key. 167 00:05:44,750 --> 00:05:46,500 ‫This is going to give us the uniqueness 168 00:05:46,500 --> 00:05:49,880 ‫of the combination of partition key and sort key. 169 00:05:49,880 --> 00:05:52,520 ‫So both of them are going to be making the primary key 170 00:05:52,520 --> 00:05:54,640 ‫and the rest are going to be attributes. 171 00:05:54,640 --> 00:05:56,730 ‫So if you consider a User_ID, 172 00:05:56,730 --> 00:05:59,400 ‫then it has a sort key which was on the Game_ID, 173 00:05:59,400 --> 00:06:02,520 ‫and then we attribute score 92, result win. 174 00:06:02,520 --> 00:06:04,500 ‫Again, another different User_ID, 175 00:06:04,500 --> 00:06:05,790 ‫and another different Game_ID, 176 00:06:05,790 --> 00:06:06,710 ‫so this works as well. 177 00:06:06,710 --> 00:06:09,180 ‫You have a lose game with a score of 14. 178 00:06:09,180 --> 00:06:11,870 ‫And more interestingly, in this third row, 179 00:06:11,870 --> 00:06:14,200 ‫what we have is that we have the same partition key. 180 00:06:14,200 --> 00:06:17,600 ‫So row two and three have the same and partition key 181 00:06:17,600 --> 00:06:19,200 ‫but a different sort key. 182 00:06:19,200 --> 00:06:22,310 ‫Of course, it is fine for a user to attend multiple games. 183 00:06:22,310 --> 00:06:23,143 ‫And so therefore, 184 00:06:23,143 --> 00:06:24,037 ‫you want the combination of User_ID 185 00:06:24,037 --> 00:06:26,110 ‫and Game_ID to be unique, obviously, 186 00:06:26,110 --> 00:06:28,430 ‫but it's fine to also have the same partition key 187 00:06:28,430 --> 00:06:30,140 ‫in different sort key. 188 00:06:30,140 --> 00:06:31,800 ‫And this is why it's super important 189 00:06:31,800 --> 00:06:34,040 ‫to choose a really good partition key, 190 00:06:34,040 --> 00:06:36,840 ‫so data is going to be distributed enough. 191 00:06:36,840 --> 00:06:38,200 ‫So here is an exercise, 192 00:06:38,200 --> 00:06:40,360 ‫and this is what the exam may test you on as well. 193 00:06:40,360 --> 00:06:42,600 ‫So you're building a movie database, 194 00:06:42,600 --> 00:06:44,990 ‫and you want to choose what is the best partition key 195 00:06:44,990 --> 00:06:47,890 ‫that is going to maximize data distribution. 196 00:06:47,890 --> 00:06:49,230 ‫Is it movie_id? 197 00:06:49,230 --> 00:06:50,410 ‫Is it producer_name? 198 00:06:50,410 --> 00:06:54,430 ‫Is it leader_actor_name or is it movie_language? 199 00:06:54,430 --> 00:06:56,420 ‫Well, think about it for a second, 200 00:06:56,420 --> 00:06:57,253 ‫what if you choose the first one, 201 00:06:57,253 --> 00:06:58,923 ‫you choose the second one, and so on? 202 00:06:59,900 --> 00:07:02,570 ‫Now the answer is you choose movie_id, 203 00:07:02,570 --> 00:07:05,960 ‫because movie_id is going to be unique for each row. 204 00:07:05,960 --> 00:07:07,830 ‫And therefore it's a really good candidates 205 00:07:07,830 --> 00:07:10,490 ‫to partition your table by. 206 00:07:10,490 --> 00:07:15,490 ‫If you have a movie language as a, sorry, a partition key, 207 00:07:15,880 --> 00:07:19,190 ‫then you won't have as many values as you want, 208 00:07:19,190 --> 00:07:20,460 ‫and maybe most of your movies 209 00:07:20,460 --> 00:07:22,060 ‫are going to be towards English. 210 00:07:22,060 --> 00:07:23,340 ‫So it's not gonna be a great choice 211 00:07:23,340 --> 00:07:25,090 ‫because there's not enough diversity, 212 00:07:25,090 --> 00:07:28,680 ‫and there's a skew of data towards one specific value. 213 00:07:28,680 --> 00:07:30,510 ‫And so the exam will ask you to choose 214 00:07:30,510 --> 00:07:32,460 ‫the best partition key for some tables 215 00:07:32,460 --> 00:07:33,380 ‫based on what it means. 216 00:07:33,380 --> 00:07:36,210 ‫So always choose the one with the highest cardinality 217 00:07:36,210 --> 00:07:38,800 ‫and the one that can take on the most amount of values. 218 00:07:38,800 --> 00:07:41,040 ‫So that's it for a short overview of DynamoDB. 219 00:07:41,040 --> 00:07:42,570 ‫We have a long section on it, 220 00:07:42,570 --> 00:07:44,540 ‫but let's go over the hands-on 221 00:07:44,540 --> 00:07:46,540 ‫to practice a little bit using DynamoDB.