1 00:00:01,160 --> 00:00:02,060 Hey. 2 00:00:02,090 --> 00:00:04,630 Hey, welcome to the next section. 3 00:00:04,640 --> 00:00:08,330 So this section, as you can probably gather, we're still working with our books data. 4 00:00:08,330 --> 00:00:12,050 I hope you're not tired of it yet if you are almost done. 5 00:00:12,230 --> 00:00:16,940 But it's important that we keep working with one data set for a little bit at least so that you can 6 00:00:16,940 --> 00:00:18,050 get familiar with it. 7 00:00:18,050 --> 00:00:22,880 And the reason that that matters is that as we do, some of the more advanced things, we want you to 8 00:00:22,880 --> 00:00:24,140 be able to check your work. 9 00:00:24,140 --> 00:00:29,810 So in this section we're going to learn a lot about different ways of performing analysis on data. 10 00:00:29,810 --> 00:00:35,480 So things like finding averages or summing a bunch of data together, grouping things by authors and 11 00:00:35,480 --> 00:00:39,440 calculating average quantities or page numbers per author. 12 00:00:39,560 --> 00:00:45,620 These are operations that if we had 10,000 books or 10,000 something else, it's really hard to know 13 00:00:45,620 --> 00:00:50,660 if you're doing it right or wrong, you get an answer, just a number, let's say 67.5. 14 00:00:50,660 --> 00:00:52,610 And how could you know if that's right or wrong? 15 00:00:52,880 --> 00:00:58,190 But if we're working with books and we have 20 of them and you're familiar with the page count and the 16 00:00:58,190 --> 00:01:04,220 authors, you'll know, okay, this author has three books that seems right versus okay, that author 17 00:01:04,220 --> 00:01:05,060 only has two books. 18 00:01:05,060 --> 00:01:06,080 Why are we saying that? 19 00:01:06,080 --> 00:01:09,530 You know, we have three or something like that terrible example. 20 00:01:09,530 --> 00:01:11,090 But the same idea is true. 21 00:01:11,180 --> 00:01:12,980 We know our data at this point. 22 00:01:12,980 --> 00:01:17,900 We're going to at the end of this course and along the way, we're going to keep upgrading our data 23 00:01:17,900 --> 00:01:22,340 to more complex structures, more tables, more rows, complex stuff. 24 00:01:22,640 --> 00:01:28,610 But at the very end, kind of the Capstone case study will be working with Instagram esque data, fake 25 00:01:28,610 --> 00:01:32,570 data for Instagram, and we'll have thousands and thousands of rows. 26 00:01:32,570 --> 00:01:36,800 And you won't actually know if you're doing things right or wrong based off the number you're getting, 27 00:01:36,800 --> 00:01:41,930 unless you've manually checked your work by doing 1000 additions or something. 28 00:01:42,140 --> 00:01:45,650 So all that's to say, stick with the books if you can. 29 00:01:46,100 --> 00:01:51,080 And in this section we're focusing a lot on these new aggregate functions. 30 00:01:51,080 --> 00:01:58,670 So those are things like finding averages, counting summing things together based off of grouping data. 31 00:01:58,670 --> 00:02:02,150 So it's a bit hard to explain in a headshot without showing you code. 32 00:02:02,150 --> 00:02:05,480 So I'll let the code do that in just a few videos from now. 33 00:02:05,480 --> 00:02:10,669 But the rough idea is that we take our data and there's all sorts of insights we can gain. 34 00:02:10,669 --> 00:02:17,080 So rather than just working with an individual row or a group of rows, we can combine things into like 35 00:02:17,090 --> 00:02:18,050 mega rows. 36 00:02:18,050 --> 00:02:24,410 I can combine all of our authors and group books based off of who wrote them or group books based off 37 00:02:24,410 --> 00:02:28,610 of what year they're written in and then perform operations on those groups. 38 00:02:28,610 --> 00:02:35,960 So it allows me to do things like find the average sales that we've had per year, or we could do things 39 00:02:35,960 --> 00:02:43,190 like find the average page number for books per genre, and then we could expand that obviously to more 40 00:02:43,190 --> 00:02:44,000 complex stuff. 41 00:02:44,000 --> 00:02:48,740 If you're working with advertising data or let's say our Instagram data, we'll be able to at the end 42 00:02:48,740 --> 00:02:55,310 of the course do things like find out which one of our users is a power user or influencers, but they're 43 00:02:55,310 --> 00:03:01,670 called, I guess, meaning that they have the most comments, the most likes on each one of their posts 44 00:03:01,670 --> 00:03:02,450 on average. 45 00:03:02,450 --> 00:03:06,800 So who in our database is getting on average the most likes and comments? 46 00:03:06,800 --> 00:03:10,520 Or we could do things like which hashtag generates the most traction. 47 00:03:10,520 --> 00:03:15,680 And to do that, we would need to analyze all of our hashtags and then take all of our photos that have 48 00:03:15,680 --> 00:03:21,290 those hashtags, group them together, figure out which one of those hashtags generates the most likes. 49 00:03:21,290 --> 00:03:24,620 So there's a lot of stuff that we can do with these aggregate functions. 50 00:03:24,620 --> 00:03:29,870 They form the backbone of a lot of the questions and analysis that we'll do throughout the course. 51 00:03:29,870 --> 00:03:31,550 All right, I'm rambling now. 52 00:03:31,550 --> 00:03:32,570 I'm going to go away. 53 00:03:32,600 --> 00:03:34,520 Hopefully you enjoy this section. 54 00:03:34,520 --> 00:03:37,640 It's important to say that a lot, but it really is. 55 00:03:37,640 --> 00:03:42,380 And of course, we'll have a bunch of exercises throughout the course, throughout the section, and 56 00:03:42,380 --> 00:03:43,580 especially at the end. 57 00:03:43,580 --> 00:03:45,200 And I'm trying to keep those interesting. 58 00:03:45,200 --> 00:03:46,280 Look forward to those. 59 00:03:46,280 --> 00:03:50,150 If you don't just remember, it's a database course. 60 00:03:50,150 --> 00:03:51,350 I'm trying my best. 61 00:03:51,680 --> 00:03:52,010 It's. 62 00:03:52,340 --> 00:03:53,480 It's databases. 63 00:03:53,720 --> 00:03:55,190 All right, I'm done.