1 00:00:00,240 --> 00:00:05,850 In this video, we're going to talk about the distinct clause which we can use to eliminate duplicate 2 00:00:05,850 --> 00:00:09,900 results, only getting distinct results in a query. 3 00:00:10,560 --> 00:00:17,160 So it looks like this We might want to select the distinct author last names from books. 4 00:00:17,520 --> 00:00:23,700 The most important thing to remember is that distinct goes after select before a column name. 5 00:00:24,510 --> 00:00:30,150 So if we come back here, let's do a select just author, last name from books. 6 00:00:30,510 --> 00:00:34,500 There's quite a few of them in here, but there's also quite a few duplicates. 7 00:00:34,530 --> 00:00:35,520 Neil Gaiman. 8 00:00:35,520 --> 00:00:37,160 Well, Gaiman is in here twice. 9 00:00:37,170 --> 00:00:38,670 Eggers is in here twice. 10 00:00:38,760 --> 00:00:40,050 Gaiman again, actually. 11 00:00:40,050 --> 00:00:40,980 So three times. 12 00:00:40,980 --> 00:00:42,900 Carver is in there twice. 13 00:00:42,900 --> 00:00:44,520 Foster Wallace is in there twice. 14 00:00:44,520 --> 00:00:45,900 And we have two Harrises. 15 00:00:45,900 --> 00:00:47,930 But they're actually not the same first name. 16 00:00:47,940 --> 00:00:52,130 They're not the same author, but we don't care about that or the query doesn't care about that. 17 00:00:52,140 --> 00:00:54,840 We're just getting all author last names. 18 00:00:55,320 --> 00:01:05,310 So if we do a distinct clause before author name, same exact query, otherwise select distinct author 19 00:01:05,700 --> 00:01:06,150 name. 20 00:01:07,960 --> 00:01:09,070 From books. 21 00:01:09,190 --> 00:01:13,990 We had 19 rows here in the first query of all last names. 22 00:01:14,200 --> 00:01:16,270 Here we only have 11. 23 00:01:16,750 --> 00:01:18,340 So we have the here we once. 24 00:01:18,370 --> 00:01:19,210 GAIMAN Once. 25 00:01:19,210 --> 00:01:20,230 EGGERS Once. 26 00:01:20,320 --> 00:01:24,670 I think the hero is actually only once in the original, but Gaiman was three times. 27 00:01:24,760 --> 00:01:26,020 Now we only have one. 28 00:01:26,020 --> 00:01:29,470 GAIMAN One Harris, one Foster Wallace, one Carver. 29 00:01:30,040 --> 00:01:30,500 Okay. 30 00:01:30,700 --> 00:01:39,910 So that's one example of using distinct let's try another let's do a select released year from books. 31 00:01:40,650 --> 00:01:44,760 This is every book's year it was released and I'm sure there's duplicates. 32 00:01:44,760 --> 00:01:47,310 2003 2003. 33 00:01:48,540 --> 00:01:49,920 Is there another example? 34 00:01:49,920 --> 00:01:52,640 I'm sure there is 2001, 2001. 35 00:01:52,650 --> 00:01:56,460 So we have 19 rose once again, we're getting every row in the data set. 36 00:01:56,490 --> 00:02:07,200 Now, if I instead select distinct released year from books, well, we get 16 only distinct results. 37 00:02:07,500 --> 00:02:13,710 So next up, I have a question, which is what about getting distinct author full names? 38 00:02:13,890 --> 00:02:16,500 Because at the moment we're only doing last names. 39 00:02:16,500 --> 00:02:26,040 If I recall this, let's actually do select start from books or maybe select author F name comma, author, 40 00:02:26,040 --> 00:02:28,650 L name from books. 41 00:02:29,520 --> 00:02:34,860 And we have at least one duplicated last name where it's not the same person. 42 00:02:35,130 --> 00:02:36,020 Dan Harris. 43 00:02:36,030 --> 00:02:39,180 Frieda Harris Well, they're being condensed together. 44 00:02:39,180 --> 00:02:40,560 They're treated the same. 45 00:02:40,560 --> 00:02:46,890 When we simply get unique or distinct last names, we lose Frieda Harris. 46 00:02:47,100 --> 00:02:50,520 So how do I get the distinct combination of the two? 47 00:02:51,000 --> 00:02:53,730 There's two solutions, at least that I can think of. 48 00:02:53,730 --> 00:02:55,860 One of which you should already know. 49 00:02:56,250 --> 00:02:57,780 Well, you might be able to think of. 50 00:02:57,810 --> 00:02:59,340 It involves a string function. 51 00:02:59,340 --> 00:03:04,920 We've already learned it's a little longer, but that option would be to use Kan cat. 52 00:03:04,920 --> 00:03:10,740 So instead of finding the distinct last names, we will concatenate first name and last name, maybe 53 00:03:10,740 --> 00:03:14,670 with a space and find the distinct results of that. 54 00:03:14,910 --> 00:03:16,530 So it looks something like that. 55 00:03:16,530 --> 00:03:25,830 Or like this select and then can cat author f name and then maybe a space and then author l name. 56 00:03:26,740 --> 00:03:28,090 From books. 57 00:03:28,270 --> 00:03:31,360 And then I want to find the distinct ones of these. 58 00:03:31,360 --> 00:03:35,260 So Neil Gaiman and Neil Gaiman, Well, those are not unique. 59 00:03:35,260 --> 00:03:36,430 So we only get one of them. 60 00:03:36,430 --> 00:03:42,170 But then Dan Harris and Frieda Harris, they are now unique, so they're considered distinct. 61 00:03:42,190 --> 00:03:44,110 Both of them would stay in our dataset. 62 00:03:44,110 --> 00:03:50,830 So let's do select distinct can cat author F name author ll name distinct. 63 00:03:51,430 --> 00:03:52,420 And there we are. 64 00:03:52,450 --> 00:03:58,870 We only get one Neil Gaiman or Dave Eggers or Raymond Carver, but we still have Dan Harris and Frieda 65 00:03:58,870 --> 00:03:59,290 Harris. 66 00:03:59,290 --> 00:04:05,020 So there's 12 distinct authors, at least based off of first name and last name with a space between 67 00:04:05,020 --> 00:04:05,440 them. 68 00:04:05,590 --> 00:04:09,910 I guess, you know, hypothetically, we could have authors who have the same name. 69 00:04:09,910 --> 00:04:13,210 I'm sure there are, you know, like an Amazon data set. 70 00:04:13,210 --> 00:04:16,120 There's thousands of authors, I'm sure, that have shared names. 71 00:04:16,120 --> 00:04:19,360 So they're all going to have IDs and something assigned to them. 72 00:04:19,360 --> 00:04:21,760 But for this, it's working. 73 00:04:21,760 --> 00:04:29,230 But there's a second way that's easier and cleaner to select distinct full names, which is just to 74 00:04:29,230 --> 00:04:35,230 use select distinct author F name. 75 00:04:36,250 --> 00:04:37,300 Comma. 76 00:04:37,420 --> 00:04:40,210 Author l name from books. 77 00:04:40,600 --> 00:04:46,990 And what we're saying here is not select the distinct, distinct first names and then also later the 78 00:04:46,990 --> 00:04:47,650 last names. 79 00:04:47,650 --> 00:04:52,930 Instead, it's saying select the distinct first name and last name combinations. 80 00:04:53,470 --> 00:04:54,520 And there we are. 81 00:04:54,580 --> 00:04:55,780 Jhumpa Lahiri. 82 00:04:55,780 --> 00:04:56,440 Neil Gaiman. 83 00:04:56,440 --> 00:04:57,070 Dave Eggers. 84 00:04:57,070 --> 00:05:00,040 We get 12 rows, the same results we had earlier. 85 00:05:00,070 --> 00:05:05,920 So if I were to change this and add a third column in, you'll see how it works. 86 00:05:05,920 --> 00:05:07,390 Let's do released here. 87 00:05:09,070 --> 00:05:14,350 Now we're saying find the distinct combinations of author, full name, first name, author, last name 88 00:05:14,350 --> 00:05:15,820 and released year. 89 00:05:15,820 --> 00:05:22,060 Well, unless we have Dan Harris and Frieda Harris with the same released year for their books. 90 00:05:23,000 --> 00:05:26,420 Which we don't write when it's 2014, when it's 2001. 91 00:05:26,960 --> 00:05:31,950 We basically just get every row in our data set because all of these combinations are distinct. 92 00:05:31,970 --> 00:05:37,010 We have Neil Gaiman, 2016, Neil Gaiman, two, one. 93 00:05:37,400 --> 00:05:39,110 All of these would have to be the same. 94 00:05:39,110 --> 00:05:44,720 All three columns would have to be the same and duplicated in order for distinct to do anything. 95 00:05:44,900 --> 00:05:46,800 But that's not what we want. 96 00:05:46,820 --> 00:05:51,980 The first example that we cared about was just the distinct combo of first name and last name. 97 00:05:52,070 --> 00:05:53,380 So that's the simpler way. 98 00:05:53,390 --> 00:05:54,580 You don't have to use Kit-Kat. 99 00:05:54,620 --> 00:05:57,090 You can just do this, separate them by commas. 100 00:05:57,110 --> 00:05:59,770 You don't even need parentheses distinct. 101 00:05:59,780 --> 00:06:05,510 We'll just figure out that you're working with this combination of columns if you separate them by commas. 102 00:06:06,260 --> 00:06:07,050 So that's distinct. 103 00:06:07,070 --> 00:06:07,930 We use that a lot. 104 00:06:07,940 --> 00:06:12,170 Make sure it comes after select before one or more column names.