1 00:00:00,120 --> 00:00:06,390 And finally we've arrived at our last entity to store, which is hash tags or tags, and we save this 2 00:00:06,390 --> 00:00:12,150 one for last because as I mentioned in the intro video, it's the hardest, I guess. 3 00:00:12,600 --> 00:00:16,560 Well, it doesn't have to be, but there's a couple of different solutions we could go about, and I 4 00:00:16,560 --> 00:00:20,430 wanted to have a discussion about which is best if there is one answer. 5 00:00:20,430 --> 00:00:24,960 So let's step through three potential structures that we could use. 6 00:00:24,960 --> 00:00:31,260 And remember, the way that this works is that one photo can have ten, 20, 50 hashtags. 7 00:00:31,260 --> 00:00:33,300 I do think there is a limit on Instagram. 8 00:00:33,570 --> 00:00:39,060 I mean, you can keep adding them as many as you want, but only 30, I believe, actually will show 9 00:00:39,060 --> 00:00:39,570 up. 10 00:00:40,410 --> 00:00:45,840 Or it's not that they won't show up, but it's that when you search for a hashtag, your photo and they 11 00:00:45,840 --> 00:00:52,410 show up if you use less than 30 anyways, we're getting off track here, but the idea is you can add 12 00:00:52,440 --> 00:00:58,530 hashtags to a photo, but then you can also go click on a hashtag and see all the associated photos 13 00:00:58,530 --> 00:00:59,100 with it. 14 00:00:59,250 --> 00:01:06,210 So here's one way we could structure it, which might seem maybe the least obvious to you, which would 15 00:01:06,210 --> 00:01:13,770 be to do it like this, where we actually just have our photos table, but we add a column called tags 16 00:01:13,770 --> 00:01:20,910 and it's a string, just a bar chart that just has our tags and we could separate them by the hash sign, 17 00:01:20,910 --> 00:01:23,700 the octo thorp or by spaces or dashes or something. 18 00:01:23,700 --> 00:01:28,740 But we're just storing them together and this could work and places do use this. 19 00:01:28,740 --> 00:01:33,750 I've seen it suggested at least I haven't seen it in practice, but you can imagine how it would work 20 00:01:34,410 --> 00:01:36,240 to add a new tag into something. 21 00:01:36,240 --> 00:01:41,610 You'd basically just take all the other tags and then concatenate in at the end a new tag or at the 22 00:01:41,610 --> 00:01:48,000 beginning, and then you could have an order to figure out which ones were posted first, I guess based 23 00:01:48,000 --> 00:01:49,410 off of that order that they're in. 24 00:01:50,400 --> 00:01:51,600 And it's pretty simple. 25 00:01:51,840 --> 00:01:58,470 If you wanted to find all photos that are tagged with cute, you just basically have to use like. 26 00:01:59,460 --> 00:02:01,620 Where tags like cute. 27 00:02:01,620 --> 00:02:04,700 But then as you can see down here, there are a couple problems. 28 00:02:04,710 --> 00:02:07,740 Let's start with an advantage, which is that it's very easy to implement. 29 00:02:07,740 --> 00:02:11,630 You don't need any other tables, but there are a couple of disadvantages. 30 00:02:11,640 --> 00:02:17,550 One is that there is a limit to how many tags you can have in there, which is the length of whatever 31 00:02:17,550 --> 00:02:22,470 the column is of a bar char, whatever the maximum length is, you can only put so many in there. 32 00:02:23,340 --> 00:02:28,500 The other problem or any other problem, actually, is that you can't store additional information. 33 00:02:28,500 --> 00:02:32,460 Like if we wanted to store the first time a hash tag was used. 34 00:02:33,030 --> 00:02:39,720 If I wanted to be able to see the first person to use the hashtag whatever cat or something, I'm not 35 00:02:39,720 --> 00:02:44,460 storing the time that something was tagged because, you know, on Instagram you can add hash tags after 36 00:02:44,460 --> 00:02:45,330 the fact. 37 00:02:45,750 --> 00:02:50,400 I'm sure they're storing that when they're tagged so that they can track trends as hash tags come and 38 00:02:50,400 --> 00:02:51,030 go. 39 00:02:51,240 --> 00:02:54,000 So it doesn't allow us to store additional information like that. 40 00:02:54,270 --> 00:02:59,940 And then the other thing that's kind of important is that you have to be careful when you're searching 41 00:02:59,940 --> 00:03:00,620 for stuff. 42 00:03:00,630 --> 00:03:05,970 So if I was trying to find the hash tag food and I wanted to see nice food, beautiful food. 43 00:03:05,970 --> 00:03:07,470 So the hash tag is just food. 44 00:03:07,770 --> 00:03:10,860 Well, it will return this, which has sad food. 45 00:03:11,460 --> 00:03:13,590 If I'm just doing like food. 46 00:03:13,590 --> 00:03:20,550 So I'd have to be careful and I need to add spaces maybe to make it easier to separate things out. 47 00:03:21,030 --> 00:03:23,480 Basically, you just have to be careful those ways around it. 48 00:03:23,490 --> 00:03:26,670 We've seen them before how you use search with the like. 49 00:03:26,910 --> 00:03:32,340 But the point is this is an easy solution, but it's not going to be the best and it's not the best 50 00:03:32,340 --> 00:03:33,450 for us right now. 51 00:03:33,990 --> 00:03:36,840 So another solution is to use two tables. 52 00:03:37,200 --> 00:03:40,680 So we have our photos as our photos table currently is now. 53 00:03:40,680 --> 00:03:47,970 But then we have a tags table where we have our tag name in this case cute and then a corresponding 54 00:03:47,970 --> 00:03:48,870 photo ID. 55 00:03:48,900 --> 00:03:56,790 So if this photo with ID of one is tagged cute, we put it there and then a photo with ID three which 56 00:03:56,790 --> 00:03:59,850 is the selfie here is also tags cute. 57 00:04:00,420 --> 00:04:05,790 And then photo ID two is tagged with microwave and three with ego, three with smile, two with gross. 58 00:04:05,790 --> 00:04:12,720 So as you can see, we can have a single photo with multiple tags like here where we have this selfie 59 00:04:12,720 --> 00:04:20,279 ID three is tagged with smile, ego and cute, and we can also have tags that are applied to multiple 60 00:04:20,279 --> 00:04:21,510 photos like cute. 61 00:04:21,540 --> 00:04:24,030 It's applied to the first photo and the third photo. 62 00:04:24,420 --> 00:04:25,770 So this works. 63 00:04:26,460 --> 00:04:28,440 There's unlimited number of tags. 64 00:04:28,680 --> 00:04:29,880 That's an advantage. 65 00:04:29,880 --> 00:04:35,970 We don't have to worry about how many can fit into the string, but a disadvantage is that for one thing, 66 00:04:35,970 --> 00:04:42,360 we are storing these tag names over and over and over and these are strings and that it's maybe not 67 00:04:42,360 --> 00:04:46,950 ideal to store all these different strings and have duplicated data, but it's not a big deal. 68 00:04:46,950 --> 00:04:48,510 The duplication isn't the problem. 69 00:04:48,510 --> 00:04:48,930 Really. 70 00:04:48,960 --> 00:04:54,240 What it comes down to is that this is actually slower than the previous solution when it comes to things 71 00:04:54,240 --> 00:04:57,300 like inserting or updating or deleting. 72 00:04:57,300 --> 00:05:01,230 But then another problem is that it's actually slower than the previous solution. 73 00:05:01,230 --> 00:05:06,270 So in just a moment, I can show you a comparison that I didn't do, but there's a great article that 74 00:05:06,270 --> 00:05:12,630 I found where somebody does a speed test between these the three solutions to figure out when they perform 75 00:05:12,630 --> 00:05:13,170 best. 76 00:05:13,290 --> 00:05:16,290 So this is not one we're going to use either. 77 00:05:17,580 --> 00:05:20,670 But so there's been three issues which maybe you got. 78 00:05:20,910 --> 00:05:22,860 It involves three tables. 79 00:05:23,100 --> 00:05:25,950 So we have our photos table unchanged. 80 00:05:26,070 --> 00:05:28,650 We have a different tags table. 81 00:05:28,650 --> 00:05:32,460 So this tags table is just a tag name and an ID. 82 00:05:33,150 --> 00:05:40,230 And then we have a middle table, a join table called photo tags or tag photos or tags or something, 83 00:05:40,230 --> 00:05:44,070 whatever you want to call it, which is an instance of a hash tag being applied to a photo. 84 00:05:44,220 --> 00:05:49,710 And all that it is is a photo ID and a tag ID, and that's it. 85 00:05:50,310 --> 00:05:57,180 So in this case, photo ID one is my cat is being tagged with cute. 86 00:05:57,570 --> 00:06:01,320 And then the same photo, my cat is being tagged with pets. 87 00:06:01,650 --> 00:06:03,860 And then we have we can keep going. 88 00:06:03,870 --> 00:06:10,260 My meal is being tagged with microwave and my meal is being tagged with groups and so on. 89 00:06:10,260 --> 00:06:13,380 So we have these two columns and that's pretty much it. 90 00:06:14,040 --> 00:06:18,900 The advantage, again, just like the previous one, there's an unlimited number of tags and we can 91 00:06:18,900 --> 00:06:20,430 also add additional information. 92 00:06:20,430 --> 00:06:26,880 So if I wanted to win, a tag is created the first time over here I could store the first time it was 93 00:06:26,880 --> 00:06:27,570 created. 94 00:06:27,720 --> 00:06:34,620 And then over here, every time it's subsequently used, I could store the time it was used or something. 95 00:06:35,010 --> 00:06:41,070 And so I'm sure there's some interesting data you could garner from that or the location of where, 96 00:06:41,070 --> 00:06:46,320 you know, where in the world it was used or something like that at the time of the day, I don't know, 97 00:06:46,320 --> 00:06:47,820 day of the week, whatever. 98 00:06:47,820 --> 00:06:50,610 And we also have less duplication of the tags over here. 99 00:06:51,120 --> 00:06:56,070 But then there are a couple of problems and they really come down to just more work. 100 00:06:56,070 --> 00:07:01,080 So when we're inserting something, if it's a first time a hash tag has been used and it's not in the 101 00:07:01,080 --> 00:07:07,620 database yet, we have to create it and then we have to associate it with a photo using photo tags. 102 00:07:07,830 --> 00:07:10,470 And the same thing when we're updating, we have to. 103 00:07:12,980 --> 00:07:13,580 What? 104 00:07:14,370 --> 00:07:16,830 She just vomited anyway. 105 00:07:16,890 --> 00:07:19,080 And then we also have to worry about orphans. 106 00:07:19,080 --> 00:07:21,420 We should always all be worried about orphans. 107 00:07:21,420 --> 00:07:27,510 But in particular, when we're talking about these tables, when we're deleting something, if we delete 108 00:07:27,510 --> 00:07:30,420 a tag for some reason, which Instagram does. 109 00:07:30,750 --> 00:07:36,900 If there's a problem, if something is being misused or somehow being, I don't know, controversial, 110 00:07:37,800 --> 00:07:39,240 they can be removed. 111 00:07:39,240 --> 00:07:43,680 And so then we just have to make sure when we're removing it from here, we also need to remove it from 112 00:07:43,680 --> 00:07:47,730 all associated photo tags so it can be a bit more complicated. 113 00:07:47,730 --> 00:07:54,960 But basically it comes down to at this the cost of having this nicer structure is that it's just a little 114 00:07:54,960 --> 00:07:55,740 more work. 115 00:07:55,740 --> 00:08:02,250 And just as an end note here, I did mention earlier that there is this great article online, this 116 00:08:02,250 --> 00:08:11,070 guy who does speed tests of different ways of doing tagging and it's written up on the link along with 117 00:08:11,070 --> 00:08:11,490 this video. 118 00:08:11,490 --> 00:08:14,280 If you want to read it, I'll just spend 10 seconds talking about it. 119 00:08:14,280 --> 00:08:21,780 But basically he does some test here to show that depending on one the size of your data, set this 120 00:08:21,780 --> 00:08:23,490 example with a smaller data set. 121 00:08:23,580 --> 00:08:27,930 The second and third approaches I showed you with the multiple tables are actually faster. 122 00:08:28,140 --> 00:08:33,390 But then as we get a larger data set with thousands of tags, it actually changes. 123 00:08:33,780 --> 00:08:39,419 So it's just interesting to note that and to read, but also it talks about how much time it takes to 124 00:08:39,419 --> 00:08:43,020 insert compared to compare it across the three different versions. 125 00:08:43,650 --> 00:08:48,720 And it's just an interesting article, but if you actually look at its conclusion section, which is 126 00:08:48,720 --> 00:08:51,720 probably what most of you care about, where's that? 127 00:08:52,910 --> 00:08:58,580 Is that he actually thinks that it would be best to have if you're working on a huge app to have two 128 00:08:58,580 --> 00:08:59,390 versions. 129 00:08:59,900 --> 00:09:04,130 The the first version I showed you and the third version I showed you. 130 00:09:04,130 --> 00:09:08,660 And basically they perform best at in different situations. 131 00:09:08,660 --> 00:09:10,880 And this is getting into the woods a little bit. 132 00:09:10,880 --> 00:09:16,130 But basically the version that we're going to go with, the third one with three different tables is 133 00:09:16,130 --> 00:09:20,780 faster if you're working with common tags, things that are used often. 134 00:09:20,780 --> 00:09:28,460 But if you're searching or you're trying to do joint with like rare or the wrong word, but uncommon 135 00:09:28,460 --> 00:09:34,190 tags that just aren't used very often, then it's actually slower and it would be faster to do the first 136 00:09:34,190 --> 00:09:36,710 version I showed you where you store them as text. 137 00:09:37,370 --> 00:09:43,520 So this guy is suggesting to use a combination of the two, which we're not going to get into. 138 00:09:43,520 --> 00:09:50,030 But just interesting to note that there isn't one best way and that it really comes down to your needs, 139 00:09:50,270 --> 00:09:53,270 what your app needs, the product needs dictate. 140 00:09:53,270 --> 00:09:59,960 And in all likelihood, as an app like Instagram grows, or especially when it started out to where 141 00:09:59,960 --> 00:10:06,680 it is now, it's schema, it's database has undergone multiple big overhauls and changes, probably 142 00:10:06,680 --> 00:10:13,340 not just the schema itself, but the database, whether it's going from something like MySQL to another 143 00:10:13,340 --> 00:10:21,140 type database or a bigger change going from a relational database like MySQL to a non relational database. 144 00:10:21,140 --> 00:10:26,510 So it's really hard to say what's best, but it's interesting to take a look at somebody who's actually 145 00:10:26,510 --> 00:10:32,840 crunched the numbers and even after doing that, can't necessarily give a definitive answer because 146 00:10:32,840 --> 00:10:34,310 the answer is it depends.