1 00:00:00,090 --> 00:00:00,420 Okay. 2 00:00:00,420 --> 00:00:01,180 Welcome back. 3 00:00:01,200 --> 00:00:02,480 So we're finally here. 4 00:00:02,490 --> 00:00:04,500 This is a bit of a turning point in the course. 5 00:00:04,950 --> 00:00:12,180 Really important section not to scare you too much, but really fundamental to how SQL and my SQL are 6 00:00:12,180 --> 00:00:13,470 used in the real world. 7 00:00:14,200 --> 00:00:19,930 So this is a section all about relationships and also something called joints, which we'll get to in 8 00:00:19,930 --> 00:00:21,730 the second part of this section. 9 00:00:21,880 --> 00:00:24,460 So let's focus on the relationships part first. 10 00:00:24,610 --> 00:00:31,270 So far in this course, we've been working with very simple data, things like books or users where 11 00:00:31,270 --> 00:00:36,430 we only have a couple of columns per table and that's been by design. 12 00:00:36,430 --> 00:00:42,820 We need that simplicity in order to focus on what's important, to isolate the different features that 13 00:00:42,820 --> 00:00:48,130 we're talking about, whether it's functions or logical operators, it's much easier to teach those, 14 00:00:48,130 --> 00:00:54,070 and we don't have 20 different tables to worry about, and we can just basically drill down into one 15 00:00:54,070 --> 00:00:55,780 table and focus on that. 16 00:00:56,140 --> 00:01:02,890 But that's typically not how the real world works, where, yeah, we might have a user's table, but 17 00:01:02,890 --> 00:01:08,430 there's no site that really can just get by on having users unless it's featureless. 18 00:01:08,440 --> 00:01:14,590 There is plenty of other things that need to go along with users, whether it's comments or likes or 19 00:01:14,590 --> 00:01:19,300 tweets or posts or photos or any sort of, I don't know, order transaction. 20 00:01:19,300 --> 00:01:22,360 There are all these different things that we could have alongside users. 21 00:01:22,360 --> 00:01:27,340 So we've been working with simple data, but that's about to change. 22 00:01:29,330 --> 00:01:35,000 Real world data, unlike what we've been working with, is really messy, typically and interrelated. 23 00:01:35,000 --> 00:01:37,220 There's a lot of connections between data. 24 00:01:37,220 --> 00:01:42,440 So some of those things I talked about with, let's say, users, if we're working with just a simple 25 00:01:42,440 --> 00:01:47,840 blog site, so nothing like a Facebook or a complex social network, just a simple blog, we still need 26 00:01:47,840 --> 00:01:48,950 to have user data. 27 00:01:48,980 --> 00:01:52,610 We need to have information about posts, blog posts. 28 00:01:52,610 --> 00:01:56,630 We also need to somehow keep track of comments and then tags. 29 00:01:56,990 --> 00:01:59,120 And that can get pretty complicated already. 30 00:01:59,120 --> 00:02:05,060 Not to mention things like advertising and tracking not only ads on their own, but how ads are related 31 00:02:05,060 --> 00:02:07,940 to users and what users are clicking on and not clicking on. 32 00:02:07,940 --> 00:02:13,670 There's so much data that just from a simple blog site, something that may not seem that complex. 33 00:02:13,670 --> 00:02:16,130 There's so much stuff that we could store. 34 00:02:16,250 --> 00:02:22,430 So we're going to focus on how we work with interconnected data in the next couple of sections. 35 00:02:22,430 --> 00:02:24,620 So we're no longer focusing on one table. 36 00:02:24,650 --> 00:02:28,940 We're going to see how do we work with two tables or three or four that are related. 37 00:02:28,940 --> 00:02:34,880 So not just tables that exist independently, but tables that reference one another and are interconnected. 38 00:02:34,880 --> 00:02:35,960 So let's get going. 39 00:02:35,960 --> 00:02:39,830 And the first thing that I want to do is go back to our books data. 40 00:02:40,100 --> 00:02:42,110 So remember our books table. 41 00:02:42,140 --> 00:02:43,220 Don't worry, we're done with it. 42 00:02:43,220 --> 00:02:47,210 We're not going to be typing any books code, but on its own it was really simple. 43 00:02:47,210 --> 00:02:54,110 We had a books table with author, first name and last name, a book title released here, a page count 44 00:02:54,110 --> 00:02:58,790 in a stock inventory, I believe a stock quantity, and I think that was it. 45 00:02:59,150 --> 00:03:04,100 So again, on its own, that just represents a single book, but you can't really do much with that. 46 00:03:04,100 --> 00:03:09,440 But if we are running a sort of book site where we sold books, at a minimum, we would need to keep 47 00:03:09,440 --> 00:03:11,510 track of things like versions. 48 00:03:11,630 --> 00:03:13,130 So we didn't even talk about this. 49 00:03:13,130 --> 00:03:16,010 But there's often multiple versions of a single book. 50 00:03:16,010 --> 00:03:22,400 If you go on Amazon and you look at a book, let's say Harry Potter, the second book, Chamber of Secrets, 51 00:03:22,400 --> 00:03:27,650 well, there's going to be the initial British release, the UK version as a hardcover, and then there's 52 00:03:27,650 --> 00:03:31,220 a paperback, and then there's the American version, and then there's a paperback. 53 00:03:31,220 --> 00:03:36,050 There is a full color illustrated version that came out relatively, relatively recently. 54 00:03:36,050 --> 00:03:39,200 It looks pretty cool that has these giant illustrations. 55 00:03:39,200 --> 00:03:40,490 It's the same title. 56 00:03:41,240 --> 00:03:42,740 How do you keep track of that? 57 00:03:42,740 --> 00:03:46,940 Is that a different table or do you just make a separate book for every one of those? 58 00:03:46,940 --> 00:03:49,790 But if you do that, then how do you make sure that they're connected? 59 00:03:49,790 --> 00:03:54,890 Because when I go on Amazon and I look at the hardcover, it also will show me below. 60 00:03:55,160 --> 00:04:02,540 Some people also buy paperback version, or it's also available in, I don't know, 25, 40 different 61 00:04:02,540 --> 00:04:03,320 languages. 62 00:04:03,320 --> 00:04:06,710 So there's all sorts of things to worry about just with versions. 63 00:04:07,700 --> 00:04:09,050 But then we have authors. 64 00:04:09,350 --> 00:04:11,120 Well, that might seem straightforward. 65 00:04:11,120 --> 00:04:17,360 We have an author, first name and last name, two columns on our books table, but that's not sufficient 66 00:04:17,360 --> 00:04:18,980 when we have multiple authors. 67 00:04:19,519 --> 00:04:22,310 So some books, a lot of books have multiple authors. 68 00:04:22,310 --> 00:04:28,370 What about if it's a research paper that we're selling or a journal that has dozens of authors potentially? 69 00:04:28,910 --> 00:04:32,750 So there's a lot of things to keep track of there that we can't really do right now. 70 00:04:32,750 --> 00:04:35,510 We only have room for one author in our columns. 71 00:04:36,630 --> 00:04:39,270 Then a whole big thing is customers. 72 00:04:39,690 --> 00:04:44,910 If we're doing a book shop online, books on their own is great, but we also need ways to keep track 73 00:04:44,910 --> 00:04:48,060 of users or customers is what I'm calling them in this case. 74 00:04:48,240 --> 00:04:55,830 So customers on their own also can't do that much, but we might keep track of their email log in information. 75 00:04:56,280 --> 00:05:00,420 Of course, we wouldn't just store their password directly, but for simplicity's sake, let's say we're 76 00:05:00,420 --> 00:05:03,780 keeping track of password and email so that they can log in. 77 00:05:04,140 --> 00:05:06,460 And then there's a whole bunch of other things. 78 00:05:06,480 --> 00:05:07,230 Orders. 79 00:05:07,230 --> 00:05:11,050 So if customers want to order books, how do we store that? 80 00:05:11,130 --> 00:05:16,530 We need to create a new order and it needs to be associated with a customer who created it. 81 00:05:16,920 --> 00:05:18,750 It needs to have address information. 82 00:05:18,750 --> 00:05:21,630 Maybe it needs to have the contents of the order. 83 00:05:21,870 --> 00:05:22,920 Are there books? 84 00:05:22,920 --> 00:05:23,920 One book? 85 00:05:23,940 --> 00:05:24,830 How do we support that? 86 00:05:24,840 --> 00:05:25,890 How do we handle that? 87 00:05:26,340 --> 00:05:33,480 What about if we're working with dates and times and we need to keep track of when a return period expires? 88 00:05:33,480 --> 00:05:39,840 So we need to have a an order date and then keep track of 30 days or 60 days after that or whatever 89 00:05:39,840 --> 00:05:40,380 it is. 90 00:05:40,500 --> 00:05:47,550 Then we also need to store or somehow keep track of the transaction information addresses for billing 91 00:05:47,550 --> 00:05:51,520 address, not only shipping address, the type of shipping, the billing information. 92 00:05:51,540 --> 00:05:53,040 There's so much to store. 93 00:05:53,910 --> 00:05:56,270 And then a whole other thing is reviews. 94 00:05:56,280 --> 00:06:02,700 So if we go on Amazon or pretty much any book site, any book that you look at will have dozens or hundreds, 95 00:06:02,700 --> 00:06:04,410 sometimes thousands of reviews. 96 00:06:04,650 --> 00:06:09,090 So we've got reviews, their ratings, that's the number of stars. 97 00:06:09,090 --> 00:06:15,840 But then also a title for the review and then a caption and then the actual content of the review, 98 00:06:15,840 --> 00:06:18,060 which may be ten characters. 99 00:06:18,240 --> 00:06:20,850 I hate this book, however many characters that is. 100 00:06:20,850 --> 00:06:24,950 Or it could be a giant paragraph, someone's love letter that they're writing to a book. 101 00:06:24,960 --> 00:06:26,730 So how do we account for all of that? 102 00:06:26,730 --> 00:06:29,700 And then the last thing that we'll talk about are genres. 103 00:06:30,060 --> 00:06:34,990 So you may think that we could just get away with storing this in a book, which we absolutely could. 104 00:06:35,010 --> 00:06:36,570 We could just have one genre. 105 00:06:36,570 --> 00:06:39,900 But many books don't cleanly fit into one genre. 106 00:06:40,230 --> 00:06:46,170 We might have a vampire novel that's also historical fiction because it's set in Victorian period. 107 00:06:46,170 --> 00:06:54,300 England Or we might have a sci fi book that has elements of fantasy or who knows, you can have all 108 00:06:54,300 --> 00:06:59,460 sorts of cross-pollinated genres, so only assigning one genre might not be that useful. 109 00:06:59,460 --> 00:07:01,110 We may want to have a bunch more. 110 00:07:01,110 --> 00:07:05,940 And in fact, on Amazon you can see there are books that will have two or three different genres or 111 00:07:05,940 --> 00:07:07,830 different tags associated with them. 112 00:07:08,070 --> 00:07:12,600 So what I'm trying to show you here is that we were working with books on their own, but in a typical 113 00:07:12,600 --> 00:07:18,930 app or a typical website that has to do with books and a bookstore, in our case, there's a lot more 114 00:07:19,080 --> 00:07:23,280 that goes along with books to make anything functional, to store any important information. 115 00:07:23,280 --> 00:07:28,500 And this is, I don't know, maybe half of the tables that you would need to get away with kind of the 116 00:07:28,560 --> 00:07:30,240 the MVP or the bare minimum. 117 00:07:30,240 --> 00:07:32,670 There's still many other things. 118 00:07:32,670 --> 00:07:36,870 Something like orders, for instance, might actually need to be split into a couple of tables. 119 00:07:36,870 --> 00:07:41,070 We might need to have an address section or an address table. 120 00:07:41,070 --> 00:07:44,460 We might need to then have a separate billing table. 121 00:07:44,520 --> 00:07:46,950 We might need to have shipping. 122 00:07:46,950 --> 00:07:49,560 There could be all different tables that we could split it up into. 123 00:07:49,590 --> 00:07:52,920 So this isn't to say this is the end all, be all this is just that. 124 00:07:52,920 --> 00:07:56,730 There's a couple of other tables that we need to consider at the bare minimum. 125 00:07:56,880 --> 00:07:57,290 Okay. 126 00:07:57,330 --> 00:07:59,550 So we'll stop here in the next video. 127 00:07:59,550 --> 00:08:04,620 We're going to identify some of the different types of relationships between data at a high level, 128 00:08:04,620 --> 00:08:07,080 what are the different types of associations? 129 00:08:07,440 --> 00:08:08,520 And I'm back. 130 00:08:08,700 --> 00:08:10,500 I totally forgot to show you this. 131 00:08:10,500 --> 00:08:13,770 Even though I wrote a note to myself, I totally skipped this. 132 00:08:13,770 --> 00:08:17,130 So I'm back after recording this, adding this to the end. 133 00:08:17,880 --> 00:08:23,430 I want to show the schema diagram that I really think highlights how messy data can be. 134 00:08:23,850 --> 00:08:27,000 So here's I think I showed this early on in the course. 135 00:08:27,090 --> 00:08:31,770 This is just a potential schema for Facebook and it's definitely incomplete. 136 00:08:31,770 --> 00:08:35,039 It's maybe a 10th of the number of tables that Facebook has. 137 00:08:35,039 --> 00:08:42,600 But you can see just to store basic things like events and users, we have an event, Facebook event, 138 00:08:42,600 --> 00:08:49,800 we have event membership which is related to RSVP states, which is also related to venues, which is 139 00:08:49,800 --> 00:08:53,670 also related to groups and to workplaces and to Facebook profiles. 140 00:08:53,670 --> 00:08:55,500 And it just goes on and on and on. 141 00:08:55,890 --> 00:08:58,710 So what I'm trying to show you is that data can get messy. 142 00:08:58,710 --> 00:09:02,640 And in fact, here's another super intimidating schema. 143 00:09:02,670 --> 00:09:08,460 This one is a little bit crazy because it has to do with storing protein database. 144 00:09:08,610 --> 00:09:14,590 I think it was I saw some article on IBM and I believe MIT, and fortunately it's very low quality image. 145 00:09:14,590 --> 00:09:16,080 So you can actually see the tables. 146 00:09:16,080 --> 00:09:22,950 But each one of these little rectangles is a separate table and every line is a connection between the 147 00:09:22,950 --> 00:09:23,520 tables. 148 00:09:23,790 --> 00:09:25,980 Now, this is the worst I've ever seen, the craziest. 149 00:09:25,980 --> 00:09:30,300 And I think that was the point, is that they're trying to just show you they're never going to actually 150 00:09:30,300 --> 00:09:32,670 look at this to try and understand anything. 151 00:09:32,670 --> 00:09:36,210 But they're trying to show you how complex protein information is when you have. 152 00:09:36,320 --> 00:09:39,560 All these different combinations and permutations for a given protein. 153 00:09:39,680 --> 00:09:44,060 Without going into too much detail here, this isn't a bio, of course, but the way that a protein 154 00:09:44,060 --> 00:09:49,910 is folded, the same structure, basically the same amino acids, depending on how it's folded, can 155 00:09:49,910 --> 00:09:53,390 drastically change the features, if you will. 156 00:09:53,420 --> 00:09:55,820 Basically, what the protein does, how it behaves. 157 00:09:55,820 --> 00:10:05,030 And so this is a way of documenting using SQL in some way the structure of a protein using a database. 158 00:10:05,570 --> 00:10:07,100 So not to scare you there. 159 00:10:07,100 --> 00:10:08,900 I just think it's kind of a powerful image. 160 00:10:09,350 --> 00:10:17,510 Although grainy, it is powerful to see basically some really messy, ugly gross data that can be supported 161 00:10:17,750 --> 00:10:18,590 using tables. 162 00:10:18,590 --> 00:10:20,270 It's just doesn't look pretty. 163 00:10:20,420 --> 00:10:21,380 But that's the important part. 164 00:10:21,380 --> 00:10:26,240 And kind of the point of this video is I wanted to get you prepared mentally for working with some more 165 00:10:26,240 --> 00:10:28,130 complex data, although nothing that looks like this.