1 00:00:00,180 --> 00:00:00,690 All right. 2 00:00:00,690 --> 00:00:01,980 Welcome to the next section. 3 00:00:02,100 --> 00:00:08,010 So what we're going to do here is basically take the schema design we came up with in the last section 4 00:00:08,010 --> 00:00:15,030 for a roughly Instagram ish application with photos and users and comments and tags and follows and 5 00:00:15,030 --> 00:00:15,840 all of that. 6 00:00:16,079 --> 00:00:21,210 We're going to take that schema and implement it, and I'm going to give you a bunch of data to put 7 00:00:21,210 --> 00:00:24,360 in your database a couple of thousand of things to enter. 8 00:00:24,570 --> 00:00:30,270 I wrote a giant file and what we'll do is put that in there so we all have the same starting place. 9 00:00:30,270 --> 00:00:34,020 And then we're going to pretend that we're the company and we're trying to figure out certain things 10 00:00:34,020 --> 00:00:37,080 like do we have too many bots on our application? 11 00:00:37,080 --> 00:00:40,560 Are there user accounts that are inactive? 12 00:00:41,070 --> 00:00:43,920 Are there user accounts that have never posted a photo? 13 00:00:43,950 --> 00:00:50,310 Maybe we could send them an email campaign encouraging them to post a photo or what hashtags tend to 14 00:00:50,310 --> 00:00:52,510 be most successful, that kind of stuff. 15 00:00:52,530 --> 00:00:59,130 So all of this section, every video will be asking a question, which is basically a nice way of saying 16 00:00:59,130 --> 00:01:05,069 it'll be an exercise, but we're now posing it rather than me just giving you a table, which is what 17 00:01:05,069 --> 00:01:06,990 we've been doing, where it's a table. 18 00:01:06,990 --> 00:01:08,960 And I just ask you to recreate this. 19 00:01:08,970 --> 00:01:14,070 I'm now going to pose the question as if we work at a fictional company and we're trying to find something 20 00:01:14,070 --> 00:01:14,500 out. 21 00:01:14,520 --> 00:01:19,950 For example, it might be something like we're running a new ad campaign and we need to figure out which 22 00:01:19,950 --> 00:01:23,610 five hashtags get the most likes or something like that. 23 00:01:24,120 --> 00:01:25,680 So that's how this will work. 24 00:01:25,680 --> 00:01:31,320 Every video in the section will be a question, and rather than having all of the solutions at the end, 25 00:01:31,320 --> 00:01:36,030 making a giant video that takes like half an hour to go through, I'm just going to break them up. 26 00:01:36,030 --> 00:01:40,470 So every video will have the question at the beginning and then a following solution. 27 00:01:40,680 --> 00:01:43,890 But the first thing we need to do is insert all of our data. 28 00:01:44,220 --> 00:01:50,010 So included in this section is this file here, starter data, SQL. 29 00:01:50,400 --> 00:01:52,230 And there's two components. 30 00:01:52,230 --> 00:01:56,310 The first one is setting up our schema and there's nothing new here. 31 00:01:56,310 --> 00:02:02,040 We talked about all of this in the last section, but if you were following along and doing it yourself, 32 00:02:02,040 --> 00:02:04,610 you may have named something slightly differently. 33 00:02:04,620 --> 00:02:10,530 Like instead of photo tags, you may have called it tagging, or instead of, you know, I don't know. 34 00:02:10,680 --> 00:02:17,100 Instead of follow me and follower and follows, maybe you called it a following or relationship or something. 35 00:02:17,100 --> 00:02:23,700 So in order to give you this data here to insert, I needed to make sure that we all have the same schema, 36 00:02:23,700 --> 00:02:26,460 even if just one thing is named differently, it's a problem. 37 00:02:27,450 --> 00:02:30,860 So the first part of this is just a bunch of create tables. 38 00:02:30,870 --> 00:02:36,540 All the tables we discussed, as well as dropping the existing Instagram clone database, creating a 39 00:02:36,540 --> 00:02:39,540 new database, and then using it if you wanted to. 40 00:02:39,540 --> 00:02:44,010 If you want to keep your current Instagram clone database, if you've been working with that, just 41 00:02:44,010 --> 00:02:46,350 give this a different name, whatever you want. 42 00:02:46,650 --> 00:02:49,770 Well, but you need to do it to all three of these. 43 00:02:50,370 --> 00:02:55,920 So we set up the schema, but then the fun part, which is all these insert statements and it doesn't 44 00:02:55,920 --> 00:03:01,140 look that impressive when you look at it in cloud nine because it doesn't wrap them on to new lines. 45 00:03:01,230 --> 00:03:04,770 And if I start scrolling, you'll see I'm scrolling really fast. 46 00:03:04,770 --> 00:03:07,230 But look down here at the progress bar. 47 00:03:07,230 --> 00:03:11,730 I've barely made a dent and this is not a great way to visualize the data. 48 00:03:11,790 --> 00:03:16,920 So instead, here I am in Sublime text and here's the file that I generated. 49 00:03:16,920 --> 00:03:20,400 It's the same exact file it's just displayed all here. 50 00:03:20,400 --> 00:03:26,760 Rather than having to scroll across, I have to scroll down so we won't do any sort of grand tour of 51 00:03:26,760 --> 00:03:27,120 the data. 52 00:03:27,120 --> 00:03:32,940 But I will say this I spent a lot of time trying to figure out how I wanted this to work. 53 00:03:32,940 --> 00:03:34,680 Obviously I didn't type this myself. 54 00:03:34,680 --> 00:03:37,590 I would have taken like days probably. 55 00:03:38,310 --> 00:03:45,450 Instead I wrote some code which still took probably an entire day and a half to get this how I wanted, 56 00:03:45,660 --> 00:03:48,450 because I didn't want just randomize data. 57 00:03:48,450 --> 00:03:54,630 If I just wanted to insert a bunch of random users and posts and comments that have no meaning to it, 58 00:03:54,630 --> 00:04:00,900 I could have done that pretty quickly, but instead I tried to design this so that there is certain 59 00:04:00,900 --> 00:04:03,570 personas to our to our users. 60 00:04:03,570 --> 00:04:05,220 So some users post a lot. 61 00:04:05,220 --> 00:04:07,260 So they have a lot of comments, a lot of posts. 62 00:04:07,260 --> 00:04:13,500 Some users are lurkers, meaning that they don't really do anything except look at people, maybe like 63 00:04:13,500 --> 00:04:16,529 certain things at best, but they don't comment and they don't post. 64 00:04:16,529 --> 00:04:22,290 Then we have things like bots, which on Instagram are a real problem, where bots are accounts that 65 00:04:22,290 --> 00:04:29,220 exist basically to go out and like a bunch of things, comment on things, usually the same exact comment, 66 00:04:29,550 --> 00:04:30,900 but they don't post things. 67 00:04:30,900 --> 00:04:36,210 And then we also have like celebrity accounts who have very few people that they actually follow, but 68 00:04:36,210 --> 00:04:42,090 they have tons of followers and they're not very active with commenting or liking with they're very 69 00:04:42,090 --> 00:04:43,170 active with posting. 70 00:04:44,170 --> 00:04:48,490 So then we've got things like hashtags and I didn't want to just do a bunch of random hashtags. 71 00:04:48,490 --> 00:04:52,990 I tried to actually come up with combinations of hashtags that would be common. 72 00:04:52,990 --> 00:05:01,660 So things like smile and style and fashion might be together often, but then we might also have smile 73 00:05:01,660 --> 00:05:04,330 and party or concert. 74 00:05:04,330 --> 00:05:08,890 Those might go together, or we might have landscape. 75 00:05:08,890 --> 00:05:13,660 And Sunrise and Sunset and Beach could be a combination for a beach photo at sunset. 76 00:05:13,660 --> 00:05:19,960 But then we could also have beach with party or beach with beauty or something like that. 77 00:05:19,960 --> 00:05:22,600 So that's just kind of scratching the surface. 78 00:05:22,600 --> 00:05:28,720 But I tried to design a lot of that in there so that we had somewhat realistic data with that said things 79 00:05:28,720 --> 00:05:29,770 like comments. 80 00:05:29,770 --> 00:05:33,820 The text of a comment is just random lorem ipsum text. 81 00:05:34,060 --> 00:05:38,980 So there's nothing there that's going to be meaningful about the comment itself. 82 00:05:39,340 --> 00:05:44,050 What's more meaningful is the relationship between the comment and who posted it and if they have a 83 00:05:44,050 --> 00:05:47,260 lot of comments or they don't have many comments and that sort of thing. 84 00:05:47,920 --> 00:05:49,030 So that's enough of that. 85 00:05:49,360 --> 00:05:55,870 So even with all the effort that I've put in here, there's still only 100 or so or exactly 100 users, 86 00:05:56,380 --> 00:06:00,130 about 250 something photos. 87 00:06:00,130 --> 00:06:06,820 But then when we start talking about likes, there's thousands and comments, there's tons and following 88 00:06:06,880 --> 00:06:07,570 or follows. 89 00:06:07,570 --> 00:06:11,560 Whatever we called it, there's tons and tags, there's only maybe 20 hashtags. 90 00:06:11,560 --> 00:06:17,380 But then when you apply them to all of those photos, 200 photos, we get thousands to your thousands 91 00:06:17,380 --> 00:06:18,430 of instances. 92 00:06:18,430 --> 00:06:23,320 So all you have to do is get that into a file and then execute that file. 93 00:06:24,340 --> 00:06:25,260 Scroll back over. 94 00:06:25,270 --> 00:06:26,230 It'll take too long. 95 00:06:26,260 --> 00:06:26,800 Oh, boy. 96 00:06:27,760 --> 00:06:28,480 Almost there. 97 00:06:28,480 --> 00:06:29,170 Here we go. 98 00:06:29,260 --> 00:06:39,130 So execute this file and I'll just do that now with source Instagram slash starter data. 99 00:06:39,700 --> 00:06:45,730 And that should just be the path that corresponds to wherever that file is and whatever you named it. 100 00:06:48,160 --> 00:06:50,830 So it takes a little bit compared to what we've seen so far. 101 00:06:50,860 --> 00:06:56,530 So if you take a look, we've got a bunch of tables being created, seven, I think, and then a bunch 102 00:06:56,530 --> 00:06:57,550 of things being inserted. 103 00:06:57,550 --> 00:07:01,630 100 257 7623. 104 00:07:01,630 --> 00:07:07,250 7488 8780 221 and then 501. 105 00:07:07,270 --> 00:07:13,930 So this is still a very small data set compared to, I mean, tiny, minuscule compared to Instagram. 106 00:07:14,320 --> 00:07:20,590 This is maybe like a minute of Instagram activity or probably less than that, a couple of seconds as 107 00:07:20,590 --> 00:07:24,700 far as the number of photos and stuff, but it's still an order of magnitude greater than what we've 108 00:07:24,700 --> 00:07:25,600 been working with. 109 00:07:26,080 --> 00:07:31,660 One other thing that I'll show you is that we haven't really seen giant amounts of data like this. 110 00:07:31,660 --> 00:07:39,730 And if you try and do, let's do a select star from likes, which I think we have a lot of you actually, 111 00:07:39,730 --> 00:07:45,400 if you if you start to scroll through it, you'll see that we're not going to be able to see all of 112 00:07:45,400 --> 00:07:45,850 them. 113 00:07:46,540 --> 00:07:53,770 There is a number that we that the terminal or that the CLI will print out for us and then it just gives 114 00:07:53,770 --> 00:07:55,970 up or decides not to display all of them. 115 00:07:55,990 --> 00:07:57,610 That doesn't mean they're not there. 116 00:07:57,790 --> 00:08:00,160 So if you did a count, for example. 117 00:08:03,870 --> 00:08:11,730 Count start from like we get 8000 and if you join them with another table, it will join all of them. 118 00:08:11,730 --> 00:08:12,990 It just won't display everything. 119 00:08:12,990 --> 00:08:15,330 So for some reason you need it to display every single. 120 00:08:15,330 --> 00:08:22,230 Like it would be better not to do it using the CLI, but to do it from another file or some code somewhere 121 00:08:22,650 --> 00:08:23,340 anyways. 122 00:08:23,490 --> 00:08:24,150 So that's it. 123 00:08:24,150 --> 00:08:26,100 We've got the data set up if you'd like. 124 00:08:26,100 --> 00:08:30,510 Play around with it, explore it a bit, and then we'll start with our fun challenges in the next video.