1
00:00:00,350 --> 00:00:02,320
So there is this concept in AWS

2
00:00:02,320 --> 00:00:03,960
that's becoming more and more common

3
00:00:03,960 --> 00:00:05,990
and something that the exam will test you on

4
00:00:05,990 --> 00:00:08,810
which is called High Performance Computing or HPC.

5
00:00:08,810 --> 00:00:10,830
So the cloud is the perfect place

6
00:00:10,830 --> 00:00:12,810
to perform high performance computing.

7
00:00:12,810 --> 00:00:15,340
Why? Because you can create a very high number

8
00:00:15,340 --> 00:00:17,270
of resources in no time,

9
00:00:17,270 --> 00:00:19,470
and you can speed up the time to results

10
00:00:19,470 --> 00:00:20,740
by adding more resources

11
00:00:20,740 --> 00:00:23,200
and you only pay for what you've used.

12
00:00:23,200 --> 00:00:25,790
Once you're done, you can destroy the entire infrastructure

13
00:00:25,790 --> 00:00:28,050
and not be built a single dime.

14
00:00:28,050 --> 00:00:29,610
So the idea here is that

15
00:00:29,610 --> 00:00:32,030
we can have an extremely high number

16
00:00:32,030 --> 00:00:34,260
of instances performing competition for us,

17
00:00:34,260 --> 00:00:36,480
and then be done with it and just pay for what we used.

18
00:00:36,480 --> 00:00:37,380
This is perfect.

19
00:00:37,380 --> 00:00:39,260
This is a great use case for the cloud,

20
00:00:39,260 --> 00:00:41,640
and something that AWS is encouraging you

21
00:00:41,640 --> 00:00:43,160
to do more and more.

22
00:00:43,160 --> 00:00:44,800
So where do we use HPC?

23
00:00:44,800 --> 00:00:46,270
Well to perform genomics,

24
00:00:46,270 --> 00:00:49,308
computational chemistry, financial risk modeling

25
00:00:49,308 --> 00:00:51,100
weather prediction, machine learning

26
00:00:51,100 --> 00:00:53,470
deep learning, autonomous driving and so on.

27
00:00:53,470 --> 00:00:54,380
So the question is,

28
00:00:54,380 --> 00:00:58,910
what services in AWS will help us perform HPC?

29
00:00:58,910 --> 00:01:00,430
Let's have a look.

30
00:01:00,430 --> 00:01:03,010
So the first category is, how do we manage the data

31
00:01:03,010 --> 00:01:05,880
and how do we transfer the data into AWS?

32
00:01:05,880 --> 00:01:07,870
The first one is going to be Direct Connect

33
00:01:07,870 --> 00:01:11,670
to move data gigabytes per second of data into the cloud

34
00:01:11,670 --> 00:01:13,330
over a private secure network.

35
00:01:13,330 --> 00:01:15,070
So we've seen this in details.

36
00:01:15,070 --> 00:01:17,841
Then we have Snowballs and Snowmobile to move PetaBytes

37
00:01:17,841 --> 00:01:21,170
of data to the cloud through a physical route,

38
00:01:21,170 --> 00:01:25,040
and they're usually four big transfers or one off transfers.

39
00:01:25,040 --> 00:01:26,880
And then we have data sync,

40
00:01:26,880 --> 00:01:28,910
where we have to install the data sync agents

41
00:01:28,910 --> 00:01:31,350
and they will help us move large amount of data

42
00:01:31,350 --> 00:01:33,990
between on-premise and FS or SMB systems

43
00:01:33,990 --> 00:01:37,270
into S3 EFS or FSX for windows.

44
00:01:37,270 --> 00:01:38,830
Okay. This makes sense.

45
00:01:38,830 --> 00:01:40,580
Now what about Compute and Networking?

46
00:01:40,580 --> 00:01:41,810
Very important.

47
00:01:41,810 --> 00:01:44,270
The first one is obviously EC2 instances.

48
00:01:44,270 --> 00:01:47,190
We have CPU optimized or GPU optimized instances

49
00:01:47,190 --> 00:01:49,860
based on the type of computations we're trying to do.

50
00:01:49,860 --> 00:01:51,980
We can also leverage Spot instances

51
00:01:51,980 --> 00:01:54,580
or Spot fleets for huge cost saving

52
00:01:54,580 --> 00:01:57,820
and Auto Scaling to automatically scale our fleets

53
00:01:57,820 --> 00:02:00,120
based on the computation we're doing.

54
00:02:00,120 --> 00:02:03,340
Finally, if our EC2 instances need to talk to one another

55
00:02:03,340 --> 00:02:06,040
and perform some computation in a distributed fashion,

56
00:02:06,040 --> 00:02:09,430
then using an EC2 placement group of type cluster

57
00:02:09,430 --> 00:02:12,930
is great to get the best network performance.

58
00:02:12,930 --> 00:02:15,760
In which case, we have a Low latency,

59
00:02:15,760 --> 00:02:18,210
10 gigabyte per second network in this example,

60
00:02:18,210 --> 00:02:20,060
and for the cluster placement group,

61
00:02:20,060 --> 00:02:21,310
everything is on the same rack,

62
00:02:21,310 --> 00:02:22,760
everything is on the same AZ.

63
00:02:23,900 --> 00:02:25,860
Okay, next, how can we go even further

64
00:02:25,860 --> 00:02:28,510
to improve the performance of our EC2 instances?

65
00:02:28,510 --> 00:02:31,610
The first one is EC2 Enhanced Networking,

66
00:02:31,610 --> 00:02:33,310
also called also called (SRI-IOV).

67
00:02:34,914 --> 00:02:36,810
And this gives you higher bandwidth,

68
00:02:36,810 --> 00:02:39,170
higher PPS which is packet per second,

69
00:02:39,170 --> 00:02:40,950
and lower latency.

70
00:02:40,950 --> 00:02:44,120
And how do we get this easy two enhance networking?

71
00:02:44,120 --> 00:02:46,810
The Option 1, which is the most recent and popular,

72
00:02:46,810 --> 00:02:49,720
is called an Elastic Network Adapter.

73
00:02:49,720 --> 00:02:51,830
And this delivers you a network speed

74
00:02:51,830 --> 00:02:54,200
of up to 100 gigabits per second.

75
00:02:54,200 --> 00:02:56,190
And this is something you have to know going into the exam.

76
00:02:56,190 --> 00:02:59,280
So, ENA, is for easy to enhance networking,

77
00:02:59,280 --> 00:03:00,440
and gives you higher bandwidth,

78
00:03:00,440 --> 00:03:01,340
higher package per second,

79
00:03:01,340 --> 00:03:02,920
and lower latency.

80
00:03:02,920 --> 00:03:03,753
Option number two,

81
00:03:03,753 --> 00:03:06,290
is to use this very complicated things from Intel,

82
00:03:06,290 --> 00:03:08,680
called 82599 VF,

83
00:03:08,680 --> 00:03:11,070
and that gives you up to 10 gigabits per second,

84
00:03:11,070 --> 00:03:13,400
and that was to be the old ENA,

85
00:03:13,400 --> 00:03:14,233
so it's LEGACY

86
00:03:14,233 --> 00:03:15,480
but I'm still including it here,

87
00:03:15,480 --> 00:03:16,890
just in case this comes up in the exam

88
00:03:16,890 --> 00:03:18,420
and you see it you know what it is.

89
00:03:18,420 --> 00:03:19,270
So, both these things,

90
00:03:19,270 --> 00:03:21,061
the ENA and the Intel,

91
00:03:21,061 --> 00:03:23,960
allow you to get easy to enhance networking

92
00:03:23,960 --> 00:03:25,330
on your instance.

93
00:03:25,330 --> 00:03:27,310
But you can push this a step further,

94
00:03:27,310 --> 00:03:30,650
and using the Elastic Fabric Adapter or EFA.

95
00:03:30,650 --> 00:03:32,800
And this is an improved ENA,

96
00:03:32,800 --> 00:03:35,510
dedicated for HPC for High Performance Computing,

97
00:03:35,510 --> 00:03:37,320
and it only works for Linux,

98
00:03:37,320 --> 00:03:39,980
and it's great when you have inter-node communication

99
00:03:39,980 --> 00:03:41,720
or tightly coupled workload.

100
00:03:41,720 --> 00:03:43,860
So think about distributed computation.

101
00:03:43,860 --> 00:03:46,810
Why? Because it's going to leverage something called MPI,

102
00:03:46,810 --> 00:03:48,660
the Message Passing Interface standard.

103
00:03:48,660 --> 00:03:52,400
And this standard will bypass the underlying Linux OS

104
00:03:52,400 --> 00:03:56,420
to provide even lower latency and more reliable transport.

105
00:03:56,420 --> 00:03:58,520
So think of it as like, if you have a Linux instance,

106
00:03:58,520 --> 00:04:01,000
and you're performing tightly a couple workloads,

107
00:04:01,000 --> 00:04:02,360
then using an EFA,

108
00:04:02,360 --> 00:04:03,450
will bypass the OS

109
00:04:03,450 --> 00:04:06,332
and provide you even higher network performance.

110
00:04:06,332 --> 00:04:08,660
So it is quite common in the exam,

111
00:04:08,660 --> 00:04:10,420
that you will be asked to differentiate

112
00:04:10,420 --> 00:04:14,850
between an ENA and EFA or ENI or something else.

113
00:04:14,850 --> 00:04:16,810
And so this is good that we are seeing this right now.

114
00:04:16,810 --> 00:04:18,459
You need to make sure to understand these concepts

115
00:04:18,459 --> 00:04:20,200
very very clearly.

116
00:04:20,200 --> 00:04:22,480
Okay. So we have transferred the data,

117
00:04:22,480 --> 00:04:23,620
we're computing over the data,

118
00:04:23,620 --> 00:04:24,740
and we've configured our network,

119
00:04:24,740 --> 00:04:26,700
but how do we store the data?

120
00:04:26,700 --> 00:04:28,130
So multiple choice,

121
00:04:28,130 --> 00:04:30,110
we can use the instance-attached storage.

122
00:04:30,110 --> 00:04:31,800
So we could be EBS,

123
00:04:31,800 --> 00:04:34,620
and this can scale up to 256,000 IOPS

124
00:04:34,620 --> 00:04:36,160
with io2 Block Express.

125
00:04:36,160 --> 00:04:37,430
It could be an instant store,

126
00:04:37,430 --> 00:04:39,920
and we've seen this can scale to million of IOPS.

127
00:04:39,920 --> 00:04:41,450
And it's linked to the EC2 instance.

128
00:04:41,450 --> 00:04:42,570
So it's on a hardware.

129
00:04:42,570 --> 00:04:44,000
It's going to be lower latency,

130
00:04:44,000 --> 00:04:46,790
but we can lose it if we lose our instance.

131
00:04:46,790 --> 00:04:48,660
Then we can use network storage,

132
00:04:48,660 --> 00:04:52,070
such as Amazon S3 to store large blob of data.

133
00:04:52,070 --> 00:04:52,910
It's not a file system,

134
00:04:52,910 --> 00:04:55,180
it's to store large objects.

135
00:04:55,180 --> 00:04:58,480
Or EFS, where the IOPS is going to be scaled

136
00:04:58,480 --> 00:05:01,300
based on the total size of your file system.

137
00:05:01,300 --> 00:05:04,530
Or we can use provisioned IOPS mode on EFS

138
00:05:04,530 --> 00:05:06,140
to get higher IOPS.

139
00:05:06,140 --> 00:05:07,640
But we've seen there is a file system

140
00:05:07,640 --> 00:05:09,490
that's dedicated to HPC,

141
00:05:09,490 --> 00:05:11,380
which was called FSX for Luster.

142
00:05:11,380 --> 00:05:13,990
And Luster was for Linux and cluster.

143
00:05:13,990 --> 00:05:16,060
And it's gonna be HPC optimized,

144
00:05:16,060 --> 00:05:17,492
gives you millions of IOPS,

145
00:05:17,492 --> 00:05:20,930
and in the backend, it's backed by S3.

146
00:05:20,930 --> 00:05:23,130
So lots of options again.

147
00:05:23,130 --> 00:05:26,280
Finally, how about Automation and Orchestration?

148
00:05:26,280 --> 00:05:27,640
The first thing will be to use batch,

149
00:05:27,640 --> 00:05:29,040
which is (indistinct) indicates,

150
00:05:29,040 --> 00:05:32,420
a support service to perform multi-node parallel jobs

151
00:05:32,420 --> 00:05:34,060
and enables you to run jobs

152
00:05:34,060 --> 00:05:36,470
that spend multiple EC2 instances.

153
00:05:36,470 --> 00:05:37,350
There are Batch jobs,

154
00:05:37,350 --> 00:05:39,040
and it's very easy to schedule these jobs

155
00:05:39,040 --> 00:05:41,390
and launch the EC2 instance accordingly.

156
00:05:41,390 --> 00:05:43,380
They will be managed by the batch service.

157
00:05:43,380 --> 00:05:46,173
So batch is a very popular choice for HPC.

158
00:05:47,030 --> 00:05:49,710
And we have AWS parallel cluster,

159
00:05:49,710 --> 00:05:51,810
which is an open source cluster management tool

160
00:05:51,810 --> 00:05:55,300
to deploy high performance computing on AWS.

161
00:05:55,300 --> 00:05:57,010
So you configure it using text files,

162
00:05:57,010 --> 00:05:59,300
and then you would deploy it on AWS.

163
00:05:59,300 --> 00:06:02,200
And it's going to automate the creation for you of VPC,

164
00:06:02,200 --> 00:06:05,080
Subnet, cluster types and instance types.

165
00:06:05,080 --> 00:06:06,430
And it can come up in the exam

166
00:06:06,430 --> 00:06:10,060
that you must use parallel cluster alongside EFA,

167
00:06:10,060 --> 00:06:12,570
because there is a perameter in the text files

168
00:06:12,570 --> 00:06:15,130
to enable elastic fabric adapters.

169
00:06:15,130 --> 00:06:17,090
So EFA on the cluster,

170
00:06:17,090 --> 00:06:20,850
and the impact of that is to improve the network performance

171
00:06:20,850 --> 00:06:25,703
and therefore have a higher performance HPC cluster.

172
00:06:26,710 --> 00:06:27,543
So to summarize,

173
00:06:27,543 --> 00:06:30,500
HPC is something that comes up more and more in the exam,

174
00:06:30,500 --> 00:06:31,860
and it's not a service,

175
00:06:31,860 --> 00:06:34,430
it's a combination of service and different options,

176
00:06:34,430 --> 00:06:35,960
and need to make sure you understand all,

177
00:06:35,960 --> 00:06:38,520
of those to maximize the potential

178
00:06:38,520 --> 00:06:40,700
of computation within AWS.

179
00:06:40,700 --> 00:06:43,350
And I hope that for this, this lecture was helpful.

180
00:06:43,350 --> 00:06:44,183
All right, that's it.

181
00:06:44,183 --> 00:06:45,860
I will see you in the next lecture.