1
00:00:05,480 --> 00:00:09,330
Hello, welcome to the third challenge
for the section 20, part of the course.

2
00:00:09,950 --> 00:00:13,470
I’m in the section 20 workspace,
and I've got two projects,

3
00:00:13,470 --> 00:00:17,900
the challenge_3 project and the
challenge_3_solution project.

4
00:00:18,160 --> 00:00:21,920
So, as usual the solution is my
solution, and the challenge_3

5
00:00:21,920 --> 00:00:25,220
project is the shell that I'm giving
you, that has the beginnings of

6
00:00:25,230 --> 00:00:26,549
the project that you can work on.

7
00:00:26,889 --> 00:00:28,349
So, let's talk about this project.

8
00:00:28,679 --> 00:00:29,680
This is a fun challenge.

9
00:00:30,020 --> 00:00:33,320
This challenge has two parts,
and it's all about understanding

10
00:00:33,420 --> 00:00:34,760
stood set and stood map.

11
00:00:34,760 --> 00:00:36,699
We're going to use the
sets and the maps together.

12
00:00:37,240 --> 00:00:39,730
We're going to be reading words
from a text file provided,

13
00:00:39,830 --> 00:00:41,130
I'm providing that for you.

14
00:00:41,560 --> 00:00:45,330
The text file is called words.txt and
I'll show that to you in just a moment.

15
00:00:45,980 --> 00:00:48,810
And it contains the first
few paragraphs from the book,

16
00:00:48,820 --> 00:00:50,139
The Wonderful Wizard of Oz.

17
00:00:50,809 --> 00:00:54,600
So, the idea here is, the first part
of the challenge is you're going to

18
00:00:54,620 --> 00:00:59,540
read each word from that file and count
how many times it occurs in the file.

19
00:00:59,540 --> 00:01:00,860
And then display that information.

20
00:01:01,470 --> 00:01:03,909
So, what we get is a
listing similar to this.

21
00:01:03,909 --> 00:01:05,280
This is just the first few words.

22
00:01:05,280 --> 00:01:08,129
You can see that the word Aunt
appears five times in the text,

23
00:01:08,650 --> 00:01:14,530
Dorothy eight, Dorothy's once,
Em 5, Even once, and from once.

24
00:01:14,850 --> 00:01:18,390
Now you'll also notice that
these words are displayed just

25
00:01:18,390 --> 00:01:22,339
like this A, D, E, F, they're
displayed in ascending order.

26
00:01:22,900 --> 00:01:24,590
And then right next to the
word, we have the number

27
00:01:24,590 --> 00:01:25,719
of times that it occurs.

28
00:01:26,539 --> 00:01:27,059
Simple!

29
00:01:27,400 --> 00:01:32,630
So, please use a map for this, and
make sure that your key value pair

30
00:01:32,630 --> 00:01:34,720
is a string.int, that makes sense.

31
00:01:34,760 --> 00:01:36,270
This is your key right here.

32
00:01:36,440 --> 00:01:39,309
The word and the value is the count.

33
00:01:39,960 --> 00:01:42,460
And the functions that I've
written for you assume this, so

34
00:01:42,460 --> 00:01:43,890
make sure you use the string.Int.

35
00:01:44,460 --> 00:01:45,369
So, that's part one.

36
00:01:45,690 --> 00:01:48,289
And let me show you the file real
quick before we talk about part two.

37
00:01:48,300 --> 00:01:50,460
Here's the file, you can see
it right here in words.txt.

38
00:01:52,800 --> 00:01:56,270
There's the file and it's the first, I
think five, or six, or seven paragraphs

39
00:01:56,270 --> 00:01:58,300
from The Wonderful Wizard of Oz.

40
00:01:58,490 --> 00:02:01,310
So, you can see Dorothy appears
here, and it's really easy to

41
00:02:01,310 --> 00:02:02,929
check your work in code light.

42
00:02:03,339 --> 00:02:06,690
For example, if I just double click
Dorothy, you can see the Dorothy

43
00:02:06,690 --> 00:02:11,850
appears once, twice, three times, four,
five, six, seven, eight, nine times.

44
00:02:11,860 --> 00:02:14,959
So, it's really easy to tell just
because it highlights it in yellow, and

45
00:02:14,970 --> 00:02:18,850
it also gives you the line number that
it appears on, which is really cool.

46
00:02:18,860 --> 00:02:20,919
Because that's what part two
of the challenge is all about.

47
00:02:21,400 --> 00:02:23,120
Alright, so let me get
back to the description.

48
00:02:23,799 --> 00:02:28,080
So, that's part one, and then
in part two what I want is, same

49
00:02:28,080 --> 00:02:32,310
idea, I want to read all the words
from the file and I don't really care how many

50
00:02:32,310 --> 00:02:34,600
times they occur. I've already got that up here.

51
00:02:34,800 --> 00:02:37,239
What I want to know is what
line numbers do they occur on.

52
00:02:37,250 --> 00:02:38,909
I want to see all the line numbers.

53
00:02:38,909 --> 00:02:43,380
So, in this case, we can do
this one, Aunt appears on line

54
00:02:43,380 --> 00:02:46,420
number 2, 7, 25, 29 and 48.

55
00:02:46,700 --> 00:02:49,140
Now, it might appear more than
once on each line, I don't want

56
00:02:49,140 --> 00:02:51,380
you to say 2-2, 7-7-7 and so forth.

57
00:02:51,679 --> 00:02:53,710
Just the line number
should appear only once.

58
00:02:53,920 --> 00:02:59,289
So, again the word should be ascending
in order and the line number should

59
00:02:59,299 --> 00:03:00,870
be ascending in order in here as well.

60
00:03:01,360 --> 00:03:06,560
So, one more time, Aunt appears
on line 2, 7, 25, 29 and 48.

61
00:03:06,920 --> 00:03:11,580
So, we can come over here to words.txt,
we can find Aunt which is right there.

62
00:03:12,110 --> 00:03:17,039
So, there it is 2, 7, 25, 29 and 48.

63
00:03:17,719 --> 00:03:20,899
It's a nice easy way
to test your output.

64
00:03:21,660 --> 00:03:22,840
That's the challenge.

65
00:03:23,140 --> 00:03:25,660
So, let me go back to the
description and we'll talk a

66
00:03:25,660 --> 00:03:26,740
little bit more about this.

67
00:03:26,840 --> 00:03:31,470
For this part, for part two,
please use a map of string, set,

68
00:03:31,599 --> 00:03:33,329
Int, key value pairs, right.

69
00:03:33,330 --> 00:03:38,510
So, the key is the word, the string,
and the value is a set of integers.

70
00:03:39,210 --> 00:03:42,090
We did this in the
description for the map video.

71
00:03:42,560 --> 00:03:44,019
This is just going to apply it now.

72
00:03:44,630 --> 00:03:47,720
For this one as well, for part
two, consider using a string

73
00:03:47,720 --> 00:03:49,600
stream to process your words.

74
00:03:49,600 --> 00:03:52,410
So, in other words, read the
line from the file into a string.

75
00:03:52,719 --> 00:03:56,870
And then use a string stream to process
that string, so you can extract the

76
00:03:56,870 --> 00:03:58,399
words from it really, really, easily.

77
00:03:58,920 --> 00:04:01,440
I've also provided the basic
shell for both of these.

78
00:04:01,440 --> 00:04:02,599
And I'll show you that in a second.

79
00:04:02,800 --> 00:04:05,470
And I've provided a function
called clean string.

80
00:04:06,080 --> 00:04:08,640
And you can pass a word into
that function, and it's going to

81
00:04:08,640 --> 00:04:10,160
return the clean version of it.

82
00:04:10,400 --> 00:04:13,480
So, it's going to remove any
trailing periods, or semicolons,

83
00:04:13,480 --> 00:04:14,750
or colons, and things like that.

84
00:04:14,750 --> 00:04:16,810
And you could tweak that
function however you like.

85
00:04:16,810 --> 00:04:18,000
I'll show you the code in a second.

86
00:04:18,250 --> 00:04:21,279
That's it. Let me run this program,
so you can see the output.

87
00:04:24,630 --> 00:04:26,902
And you can see all the display
I'm going to scroll all the way up

88
00:04:26,969 --> 00:04:29,690
to the top, it's running part one
and then it's running part two.

89
00:04:29,840 --> 00:04:34,070
Here's part one, Aunt appears five
times, Dorothy appears eight times,

90
00:04:34,200 --> 00:04:37,270
Em appears five times, and you go all
the way through Aunt and lowercase

91
00:04:37,790 --> 00:04:39,010
Aunt would be different words.

92
00:04:39,010 --> 00:04:41,690
You can choose to make them all
upper, or lower, or do whatever you

93
00:04:41,690 --> 00:04:43,170
like with that just to have fun.

94
00:04:43,690 --> 00:04:45,890
So, in this case, let's look
at some of the common words in

95
00:04:45,890 --> 00:04:52,900
English, the ‘a’ appears 20 times,
‘and’ appears 32 times, ‘the’ is

96
00:04:53,180 --> 00:04:58,140
a very popular word, and there
is ‘the’, it appears 43 times.

97
00:04:58,580 --> 00:04:59,650
So, that's part one.

98
00:05:00,590 --> 00:05:03,450
Again notice that this is
sorted in ascending order.

99
00:05:04,639 --> 00:05:08,370
Now here's part two, we have
the word and the occurrences

100
00:05:08,370 --> 00:05:09,710
of that word with line numbers.

101
00:05:09,710 --> 00:05:14,710
So, Aunt appears on lines 2, 7,
25, 29 and 48, here you can see

102
00:05:14,710 --> 00:05:20,349
that Henry appears on lines 2,
6, 35, 45, and 50, ‘There’ with

103
00:05:20,349 --> 00:05:21,899
the capital T appears on line 2.

104
00:05:22,350 --> 00:05:23,820
And you can see the rest.

105
00:05:23,969 --> 00:05:24,710
They're all here.

106
00:05:25,270 --> 00:05:26,420
So, that's the challenge.

107
00:05:26,860 --> 00:05:28,289
It's not that hard.

108
00:05:28,510 --> 00:05:30,020
Don't overthink this challenge.

109
00:05:30,190 --> 00:05:32,150
Let the STL do the work for you.

110
00:05:32,610 --> 00:05:35,530
Don't start messing around, and trying
to get into the details of this,

111
00:05:35,530 --> 00:05:37,469
just let the STL do the work for you.

112
00:05:37,910 --> 00:05:38,930
Think abstractly.

113
00:05:39,160 --> 00:05:43,090
So, that's a sample run, and let me
show you the code that I'm giving you.

114
00:05:43,100 --> 00:05:46,590
It's right in here in
challenge_3 project, main, CPP.

115
00:05:47,160 --> 00:05:51,040
So, I'll double click there, and let
me scroll down to the main real quick.

116
00:05:52,349 --> 00:05:53,630
There's your main, right here.

117
00:05:53,630 --> 00:05:56,269
It's running part one
and then part two.

118
00:05:56,780 --> 00:06:00,980
And for each part, I'm, here's part
two for example, there's the file.

119
00:06:00,980 --> 00:06:02,570
I'm opening the file. I'm closing the file.

120
00:06:02,610 --> 00:06:07,049
So, think of part one and part
two as unique little functions

121
00:06:07,080 --> 00:06:09,650
that each one opens the file,
each one closes the file.

122
00:06:10,080 --> 00:06:14,220
So, you can see in part one right
here, my map is called words.

123
00:06:14,599 --> 00:06:17,619
And it's a map of string,
Int, key value pairs.

124
00:06:17,969 --> 00:06:19,750
I've got some variables here.

125
00:06:19,810 --> 00:06:21,600
I've got my file which I'm opening.

126
00:06:21,980 --> 00:06:24,440
I'm checking to see if
it was open successfully.

127
00:06:24,440 --> 00:06:26,979
And if it is then you write
this code, and then when you're

128
00:06:26,980 --> 00:06:29,399
done with your code, you call
that function right there.

129
00:06:29,400 --> 00:06:32,479
Display words, pass it in the
map and I've written the code

130
00:06:32,480 --> 00:06:34,650
that's going to display that
all in a nice little format.

131
00:06:35,020 --> 00:06:40,569
But this is the code you need to
implement here for part two, same idea.

132
00:06:40,959 --> 00:06:42,089
Here's my map words.

133
00:06:42,169 --> 00:06:45,700
This time, it's a map of key
value pairs where the key is a string

134
00:06:45,700 --> 00:06:49,060
and the value is a set
of integers as we discussed.

135
00:06:49,460 --> 00:06:51,060
I'm opening the file.

136
00:06:51,660 --> 00:06:54,000
I'm closing the file here
if it was open successfully.

137
00:06:54,000 --> 00:06:58,000
You implement this code, when you're
done processing the map, pass the

138
00:06:58,000 --> 00:07:01,890
map into display words, and it'll
display all that information for you.

139
00:07:02,400 --> 00:07:06,550
Now those two functions I
wrote up here, this is display

140
00:07:06,550 --> 00:07:09,770
words for part one, and this
is display words for part two.

141
00:07:10,119 --> 00:07:14,260
The only difference is this one
expects string, Int, key value pairs.

142
00:07:14,490 --> 00:07:16,560
This one expects
obviously a map of those.

143
00:07:16,750 --> 00:07:20,690
This expects a map of string and
a set of Int key value pairs.

144
00:07:21,250 --> 00:07:23,270
You can totally erase
these, and do them yourself.

145
00:07:23,270 --> 00:07:25,140
I would actually encourage
you to do that but they're

146
00:07:25,140 --> 00:07:26,860
there for your benefit.

147
00:07:27,410 --> 00:07:29,309
Get rid of them, and do them when
you're on your own if you like.

148
00:07:29,600 --> 00:07:31,720
And then here's the
clean-string function.

149
00:07:32,210 --> 00:07:36,370
It expects a string, and you can
see the code is dead simple for each

150
00:07:36,370 --> 00:07:40,120
character in the string if it's a
period, or a comma, or a semicolon, or

151
00:07:40,120 --> 00:07:42,300
a colon, I'm just not processing it.

152
00:07:42,670 --> 00:07:45,589
Otherwise, I'm just appending the
character to the new string and

153
00:07:45,589 --> 00:07:49,280
returning it, so that's it I'm
just filtering on those characters.

154
00:07:49,280 --> 00:07:51,680
You can add some, remove
some, or do whatever you like.

155
00:07:52,530 --> 00:07:52,919
That's it.

156
00:07:52,940 --> 00:07:53,840
That's the challenge.

157
00:07:54,230 --> 00:07:55,409
Again, have fun.

158
00:07:55,570 --> 00:07:58,360
This is actually a fun challenge,
and it's really easy to do.

159
00:07:58,360 --> 00:08:00,460
But you'll be surprised
how easy this is to do.

160
00:08:00,910 --> 00:08:05,459
It doesn't seem so easy, but it really
is, and as I said the best advice I

161
00:08:05,459 --> 00:08:07,530
can give you is don't overthink this.

162
00:08:07,570 --> 00:08:10,060
Let the STL do its job
that's what it's good at.

163
00:08:10,830 --> 00:08:14,440
Okay, so I will see you in the
next video in the solution.

164
00:08:14,639 --> 00:08:15,630
So, have fun!