1 00:00:05,480 --> 00:00:09,330 Hello, welcome to the third challenge for the section 20, part of the course. 2 00:00:09,950 --> 00:00:13,470 I’m in the section 20 workspace, and I've got two projects, 3 00:00:13,470 --> 00:00:17,900 the challenge_3 project and the challenge_3_solution project. 4 00:00:18,160 --> 00:00:21,920 So, as usual the solution is my solution, and the challenge_3 5 00:00:21,920 --> 00:00:25,220 project is the shell that I'm giving you, that has the beginnings of 6 00:00:25,230 --> 00:00:26,549 the project that you can work on. 7 00:00:26,889 --> 00:00:28,349 So, let's talk about this project. 8 00:00:28,679 --> 00:00:29,680 This is a fun challenge. 9 00:00:30,020 --> 00:00:33,320 This challenge has two parts, and it's all about understanding 10 00:00:33,420 --> 00:00:34,760 stood set and stood map. 11 00:00:34,760 --> 00:00:36,699 We're going to use the sets and the maps together. 12 00:00:37,240 --> 00:00:39,730 We're going to be reading words from a text file provided, 13 00:00:39,830 --> 00:00:41,130 I'm providing that for you. 14 00:00:41,560 --> 00:00:45,330 The text file is called words.txt and I'll show that to you in just a moment. 15 00:00:45,980 --> 00:00:48,810 And it contains the first few paragraphs from the book, 16 00:00:48,820 --> 00:00:50,139 The Wonderful Wizard of Oz. 17 00:00:50,809 --> 00:00:54,600 So, the idea here is, the first part of the challenge is you're going to 18 00:00:54,620 --> 00:00:59,540 read each word from that file and count how many times it occurs in the file. 19 00:00:59,540 --> 00:01:00,860 And then display that information. 20 00:01:01,470 --> 00:01:03,909 So, what we get is a listing similar to this. 21 00:01:03,909 --> 00:01:05,280 This is just the first few words. 22 00:01:05,280 --> 00:01:08,129 You can see that the word Aunt appears five times in the text, 23 00:01:08,650 --> 00:01:14,530 Dorothy eight, Dorothy's once, Em 5, Even once, and from once. 24 00:01:14,850 --> 00:01:18,390 Now you'll also notice that these words are displayed just 25 00:01:18,390 --> 00:01:22,339 like this A, D, E, F, they're displayed in ascending order. 26 00:01:22,900 --> 00:01:24,590 And then right next to the word, we have the number 27 00:01:24,590 --> 00:01:25,719 of times that it occurs. 28 00:01:26,539 --> 00:01:27,059 Simple! 29 00:01:27,400 --> 00:01:32,630 So, please use a map for this, and make sure that your key value pair 30 00:01:32,630 --> 00:01:34,720 is a string.int, that makes sense. 31 00:01:34,760 --> 00:01:36,270 This is your key right here. 32 00:01:36,440 --> 00:01:39,309 The word and the value is the count. 33 00:01:39,960 --> 00:01:42,460 And the functions that I've written for you assume this, so 34 00:01:42,460 --> 00:01:43,890 make sure you use the string.Int. 35 00:01:44,460 --> 00:01:45,369 So, that's part one. 36 00:01:45,690 --> 00:01:48,289 And let me show you the file real quick before we talk about part two. 37 00:01:48,300 --> 00:01:50,460 Here's the file, you can see it right here in words.txt. 38 00:01:52,800 --> 00:01:56,270 There's the file and it's the first, I think five, or six, or seven paragraphs 39 00:01:56,270 --> 00:01:58,300 from The Wonderful Wizard of Oz. 40 00:01:58,490 --> 00:02:01,310 So, you can see Dorothy appears here, and it's really easy to 41 00:02:01,310 --> 00:02:02,929 check your work in code light. 42 00:02:03,339 --> 00:02:06,690 For example, if I just double click Dorothy, you can see the Dorothy 43 00:02:06,690 --> 00:02:11,850 appears once, twice, three times, four, five, six, seven, eight, nine times. 44 00:02:11,860 --> 00:02:14,959 So, it's really easy to tell just because it highlights it in yellow, and 45 00:02:14,970 --> 00:02:18,850 it also gives you the line number that it appears on, which is really cool. 46 00:02:18,860 --> 00:02:20,919 Because that's what part two of the challenge is all about. 47 00:02:21,400 --> 00:02:23,120 Alright, so let me get back to the description. 48 00:02:23,799 --> 00:02:28,080 So, that's part one, and then in part two what I want is, same 49 00:02:28,080 --> 00:02:32,310 idea, I want to read all the words from the file and I don't really care how many 50 00:02:32,310 --> 00:02:34,600 times they occur. I've already got that up here. 51 00:02:34,800 --> 00:02:37,239 What I want to know is what line numbers do they occur on. 52 00:02:37,250 --> 00:02:38,909 I want to see all the line numbers. 53 00:02:38,909 --> 00:02:43,380 So, in this case, we can do this one, Aunt appears on line 54 00:02:43,380 --> 00:02:46,420 number 2, 7, 25, 29 and 48. 55 00:02:46,700 --> 00:02:49,140 Now, it might appear more than once on each line, I don't want 56 00:02:49,140 --> 00:02:51,380 you to say 2-2, 7-7-7 and so forth. 57 00:02:51,679 --> 00:02:53,710 Just the line number should appear only once. 58 00:02:53,920 --> 00:02:59,289 So, again the word should be ascending in order and the line number should 59 00:02:59,299 --> 00:03:00,870 be ascending in order in here as well. 60 00:03:01,360 --> 00:03:06,560 So, one more time, Aunt appears on line 2, 7, 25, 29 and 48. 61 00:03:06,920 --> 00:03:11,580 So, we can come over here to words.txt, we can find Aunt which is right there. 62 00:03:12,110 --> 00:03:17,039 So, there it is 2, 7, 25, 29 and 48. 63 00:03:17,719 --> 00:03:20,899 It's a nice easy way to test your output. 64 00:03:21,660 --> 00:03:22,840 That's the challenge. 65 00:03:23,140 --> 00:03:25,660 So, let me go back to the description and we'll talk a 66 00:03:25,660 --> 00:03:26,740 little bit more about this. 67 00:03:26,840 --> 00:03:31,470 For this part, for part two, please use a map of string, set, 68 00:03:31,599 --> 00:03:33,329 Int, key value pairs, right. 69 00:03:33,330 --> 00:03:38,510 So, the key is the word, the string, and the value is a set of integers. 70 00:03:39,210 --> 00:03:42,090 We did this in the description for the map video. 71 00:03:42,560 --> 00:03:44,019 This is just going to apply it now. 72 00:03:44,630 --> 00:03:47,720 For this one as well, for part two, consider using a string 73 00:03:47,720 --> 00:03:49,600 stream to process your words. 74 00:03:49,600 --> 00:03:52,410 So, in other words, read the line from the file into a string. 75 00:03:52,719 --> 00:03:56,870 And then use a string stream to process that string, so you can extract the 76 00:03:56,870 --> 00:03:58,399 words from it really, really, easily. 77 00:03:58,920 --> 00:04:01,440 I've also provided the basic shell for both of these. 78 00:04:01,440 --> 00:04:02,599 And I'll show you that in a second. 79 00:04:02,800 --> 00:04:05,470 And I've provided a function called clean string. 80 00:04:06,080 --> 00:04:08,640 And you can pass a word into that function, and it's going to 81 00:04:08,640 --> 00:04:10,160 return the clean version of it. 82 00:04:10,400 --> 00:04:13,480 So, it's going to remove any trailing periods, or semicolons, 83 00:04:13,480 --> 00:04:14,750 or colons, and things like that. 84 00:04:14,750 --> 00:04:16,810 And you could tweak that function however you like. 85 00:04:16,810 --> 00:04:18,000 I'll show you the code in a second. 86 00:04:18,250 --> 00:04:21,279 That's it. Let me run this program, so you can see the output. 87 00:04:24,630 --> 00:04:26,902 And you can see all the display I'm going to scroll all the way up 88 00:04:26,969 --> 00:04:29,690 to the top, it's running part one and then it's running part two. 89 00:04:29,840 --> 00:04:34,070 Here's part one, Aunt appears five times, Dorothy appears eight times, 90 00:04:34,200 --> 00:04:37,270 Em appears five times, and you go all the way through Aunt and lowercase 91 00:04:37,790 --> 00:04:39,010 Aunt would be different words. 92 00:04:39,010 --> 00:04:41,690 You can choose to make them all upper, or lower, or do whatever you 93 00:04:41,690 --> 00:04:43,170 like with that just to have fun. 94 00:04:43,690 --> 00:04:45,890 So, in this case, let's look at some of the common words in 95 00:04:45,890 --> 00:04:52,900 English, the ‘a’ appears 20 times, ‘and’ appears 32 times, ‘the’ is 96 00:04:53,180 --> 00:04:58,140 a very popular word, and there is ‘the’, it appears 43 times. 97 00:04:58,580 --> 00:04:59,650 So, that's part one. 98 00:05:00,590 --> 00:05:03,450 Again notice that this is sorted in ascending order. 99 00:05:04,639 --> 00:05:08,370 Now here's part two, we have the word and the occurrences 100 00:05:08,370 --> 00:05:09,710 of that word with line numbers. 101 00:05:09,710 --> 00:05:14,710 So, Aunt appears on lines 2, 7, 25, 29 and 48, here you can see 102 00:05:14,710 --> 00:05:20,349 that Henry appears on lines 2, 6, 35, 45, and 50, ‘There’ with 103 00:05:20,349 --> 00:05:21,899 the capital T appears on line 2. 104 00:05:22,350 --> 00:05:23,820 And you can see the rest. 105 00:05:23,969 --> 00:05:24,710 They're all here. 106 00:05:25,270 --> 00:05:26,420 So, that's the challenge. 107 00:05:26,860 --> 00:05:28,289 It's not that hard. 108 00:05:28,510 --> 00:05:30,020 Don't overthink this challenge. 109 00:05:30,190 --> 00:05:32,150 Let the STL do the work for you. 110 00:05:32,610 --> 00:05:35,530 Don't start messing around, and trying to get into the details of this, 111 00:05:35,530 --> 00:05:37,469 just let the STL do the work for you. 112 00:05:37,910 --> 00:05:38,930 Think abstractly. 113 00:05:39,160 --> 00:05:43,090 So, that's a sample run, and let me show you the code that I'm giving you. 114 00:05:43,100 --> 00:05:46,590 It's right in here in challenge_3 project, main, CPP. 115 00:05:47,160 --> 00:05:51,040 So, I'll double click there, and let me scroll down to the main real quick. 116 00:05:52,349 --> 00:05:53,630 There's your main, right here. 117 00:05:53,630 --> 00:05:56,269 It's running part one and then part two. 118 00:05:56,780 --> 00:06:00,980 And for each part, I'm, here's part two for example, there's the file. 119 00:06:00,980 --> 00:06:02,570 I'm opening the file. I'm closing the file. 120 00:06:02,610 --> 00:06:07,049 So, think of part one and part two as unique little functions 121 00:06:07,080 --> 00:06:09,650 that each one opens the file, each one closes the file. 122 00:06:10,080 --> 00:06:14,220 So, you can see in part one right here, my map is called words. 123 00:06:14,599 --> 00:06:17,619 And it's a map of string, Int, key value pairs. 124 00:06:17,969 --> 00:06:19,750 I've got some variables here. 125 00:06:19,810 --> 00:06:21,600 I've got my file which I'm opening. 126 00:06:21,980 --> 00:06:24,440 I'm checking to see if it was open successfully. 127 00:06:24,440 --> 00:06:26,979 And if it is then you write this code, and then when you're 128 00:06:26,980 --> 00:06:29,399 done with your code, you call that function right there. 129 00:06:29,400 --> 00:06:32,479 Display words, pass it in the map and I've written the code 130 00:06:32,480 --> 00:06:34,650 that's going to display that all in a nice little format. 131 00:06:35,020 --> 00:06:40,569 But this is the code you need to implement here for part two, same idea. 132 00:06:40,959 --> 00:06:42,089 Here's my map words. 133 00:06:42,169 --> 00:06:45,700 This time, it's a map of key value pairs where the key is a string 134 00:06:45,700 --> 00:06:49,060 and the value is a set of integers as we discussed. 135 00:06:49,460 --> 00:06:51,060 I'm opening the file. 136 00:06:51,660 --> 00:06:54,000 I'm closing the file here if it was open successfully. 137 00:06:54,000 --> 00:06:58,000 You implement this code, when you're done processing the map, pass the 138 00:06:58,000 --> 00:07:01,890 map into display words, and it'll display all that information for you. 139 00:07:02,400 --> 00:07:06,550 Now those two functions I wrote up here, this is display 140 00:07:06,550 --> 00:07:09,770 words for part one, and this is display words for part two. 141 00:07:10,119 --> 00:07:14,260 The only difference is this one expects string, Int, key value pairs. 142 00:07:14,490 --> 00:07:16,560 This one expects obviously a map of those. 143 00:07:16,750 --> 00:07:20,690 This expects a map of string and a set of Int key value pairs. 144 00:07:21,250 --> 00:07:23,270 You can totally erase these, and do them yourself. 145 00:07:23,270 --> 00:07:25,140 I would actually encourage you to do that but they're 146 00:07:25,140 --> 00:07:26,860 there for your benefit. 147 00:07:27,410 --> 00:07:29,309 Get rid of them, and do them when you're on your own if you like. 148 00:07:29,600 --> 00:07:31,720 And then here's the clean-string function. 149 00:07:32,210 --> 00:07:36,370 It expects a string, and you can see the code is dead simple for each 150 00:07:36,370 --> 00:07:40,120 character in the string if it's a period, or a comma, or a semicolon, or 151 00:07:40,120 --> 00:07:42,300 a colon, I'm just not processing it. 152 00:07:42,670 --> 00:07:45,589 Otherwise, I'm just appending the character to the new string and 153 00:07:45,589 --> 00:07:49,280 returning it, so that's it I'm just filtering on those characters. 154 00:07:49,280 --> 00:07:51,680 You can add some, remove some, or do whatever you like. 155 00:07:52,530 --> 00:07:52,919 That's it. 156 00:07:52,940 --> 00:07:53,840 That's the challenge. 157 00:07:54,230 --> 00:07:55,409 Again, have fun. 158 00:07:55,570 --> 00:07:58,360 This is actually a fun challenge, and it's really easy to do. 159 00:07:58,360 --> 00:08:00,460 But you'll be surprised how easy this is to do. 160 00:08:00,910 --> 00:08:05,459 It doesn't seem so easy, but it really is, and as I said the best advice I 161 00:08:05,459 --> 00:08:07,530 can give you is don't overthink this. 162 00:08:07,570 --> 00:08:10,060 Let the STL do its job that's what it's good at. 163 00:08:10,830 --> 00:08:14,440 Okay, so I will see you in the next video in the solution. 164 00:08:14,639 --> 00:08:15,630 So, have fun!