1 00:00:01,130 --> 00:00:02,020 Hey, Cloud Gurus. 2 00:00:02,020 --> 00:00:03,727 Welcome back to our series, 3 00:00:03,727 --> 00:00:06,340 Introducing the Tools of the Trade. 4 00:00:06,340 --> 00:00:08,403 In this video, we're talking about Scala. 5 00:00:10,570 --> 00:00:12,770 So we'll start with an overview. 6 00:00:12,770 --> 00:00:14,920 We'll then look at some Scala code 7 00:00:14,920 --> 00:00:17,193 and then wrap everything up with a review. 8 00:00:18,060 --> 00:00:21,200 For the exam, you don't have to be a Scala expert. 9 00:00:21,200 --> 00:00:23,240 You more or less need to be able to recognize 10 00:00:23,240 --> 00:00:25,930 that it is Scala syntax you are looking at 11 00:00:25,930 --> 00:00:30,140 and be able to fill in some holes using dropdown boxes. 12 00:00:30,140 --> 00:00:32,430 So for example, you'll have 4 options 13 00:00:32,430 --> 00:00:34,220 and you'll have to decide which of those 14 00:00:34,220 --> 00:00:38,150 slots into the Scala syntax you're looking at. 15 00:00:38,150 --> 00:00:40,530 For that reason, we're going to go over some examples 16 00:00:40,530 --> 00:00:42,300 so you recognize it on site, 17 00:00:42,300 --> 00:00:45,533 but we're not going in-depth in how to be a Scala expert. 18 00:00:47,100 --> 00:00:49,270 So what even is Scala? 19 00:00:49,270 --> 00:00:52,400 Well, the full definition taken from their website is 20 00:00:52,400 --> 00:00:55,850 Scala combines object-oriented and functional programming 21 00:00:55,850 --> 00:00:58,770 in one concise, high-level language. Scala's 22 00:00:58,770 --> 00:01:02,970 static types help avoid bugs in complex applications, 23 00:01:02,970 --> 00:01:06,070 and its JVM and JavaScript runtimes 24 00:01:06,070 --> 00:01:08,320 let you build high-performance systems 25 00:01:08,320 --> 00:01:11,960 with easy access to huge ecosystems of libraries. 26 00:01:11,960 --> 00:01:15,203 And that's from www.scalalang.org. 27 00:01:16,300 --> 00:01:18,730 For our purposes, it's a programming language 28 00:01:18,730 --> 00:01:20,540 leveraged in Azure Databricks 29 00:01:20,540 --> 00:01:23,253 for ETL and data analysis operations. 30 00:01:24,240 --> 00:01:25,820 It's not the only type of code you'll see 31 00:01:25,820 --> 00:01:27,210 come across the exam. 32 00:01:27,210 --> 00:01:28,920 So let's take a look at some examples, 33 00:01:28,920 --> 00:01:30,720 so that you can easily recognize it. 34 00:01:32,980 --> 00:01:35,830 All of these examples can be executed within a notebook 35 00:01:35,830 --> 00:01:38,440 in your Azure Databricks workspace. 36 00:01:38,440 --> 00:01:40,170 And for our first example, 37 00:01:40,170 --> 00:01:44,310 this is loading a sample JSON file as a data frame. 38 00:01:44,310 --> 00:01:48,020 You'll see our val df for data frame. 39 00:01:48,020 --> 00:01:52,390 And we're reading in a JSON file at the path listed. 40 00:01:52,390 --> 00:01:55,680 In order to then display the contents of that data frame, 41 00:01:55,680 --> 00:01:58,283 we could use df.show. 42 00:01:59,290 --> 00:02:01,800 And we now have the contents of that JSON file 43 00:02:01,800 --> 00:02:04,300 in a Scala data frame so that we can work with it. 44 00:02:06,150 --> 00:02:09,300 If we wanted to move from there and transform the data, 45 00:02:09,300 --> 00:02:11,710 we could use a query like this. 46 00:02:11,710 --> 00:02:16,130 We're specifying a new value called specificColumnsDf 47 00:02:16,130 --> 00:02:18,920 and we're taking that same data frame, df, 48 00:02:18,920 --> 00:02:20,230 and selecting from it, 49 00:02:20,230 --> 00:02:23,440 df.select and pulling the columns 50 00:02:23,440 --> 00:02:28,200 first name, last name, gender, location, and level. 51 00:02:28,200 --> 00:02:29,900 Again, to output the results of that, 52 00:02:29,900 --> 00:02:32,920 we can use the .show mechanism. 53 00:02:32,920 --> 00:02:36,023 And this time we say specific columns df. 54 00:02:37,280 --> 00:02:39,350 And of course you can do more than just limit 55 00:02:39,350 --> 00:02:40,990 to specific columns. 56 00:02:40,990 --> 00:02:44,260 To further transform the data, we can rename a column 57 00:02:44,260 --> 00:02:48,813 such as changing the level column to subscription_type. 58 00:02:50,360 --> 00:02:52,560 By this point, you're probably noticing a pattern. 59 00:02:52,560 --> 00:02:55,760 We define a value renamedColumnsDf. 60 00:02:55,760 --> 00:02:58,600 And then we reference the last results that we had, 61 00:02:58,600 --> 00:03:03,600 specificColumnsDf, and use .withColumnRenamed. 62 00:03:04,210 --> 00:03:07,300 And again, as you might expect, to output those results, 63 00:03:07,300 --> 00:03:11,113 we can simply say renamedcolumnsdf.show. 64 00:03:13,280 --> 00:03:15,933 Eventually, you'll probably need to load the data. 65 00:03:16,920 --> 00:03:19,000 There are many options for that 66 00:03:19,000 --> 00:03:21,950 and you can use Scala to set the configuration. 67 00:03:21,950 --> 00:03:25,200 Such as using spark.conf.set 68 00:03:25,200 --> 00:03:29,810 to set spark.sql.parquet.writelegacyformat 69 00:03:29,810 --> 00:03:31,290 equal to true. 70 00:03:31,290 --> 00:03:33,900 And when set to true, this particular setting 71 00:03:33,900 --> 00:03:36,170 will cause data to be written in the manner of 72 00:03:36,170 --> 00:03:38,313 Spark 1.4 and earlier. 73 00:03:39,210 --> 00:03:42,363 To actually load the data, we can use a query such as this. 74 00:03:43,870 --> 00:03:46,310 Here, we're loading the transformed data frame 75 00:03:46,310 --> 00:03:49,800 of renamedColumnsDf and writing it as a table 76 00:03:49,800 --> 00:03:51,700 in Azure Synapse. 77 00:03:51,700 --> 00:03:54,260 So you can see we specify .write. 78 00:03:54,260 --> 00:03:56,160 There are several options passed in here, 79 00:03:56,160 --> 00:03:59,070 which you would have set as variables earlier on. 80 00:03:59,070 --> 00:04:02,640 Such as the URL, the tempDir, and so forth. 81 00:04:02,640 --> 00:04:05,180 But we're creating a table called sample table 82 00:04:05,180 --> 00:04:07,810 to load our data frame into. 83 00:04:07,810 --> 00:04:09,606 You can see we have an option here 84 00:04:09,606 --> 00:04:14,606 forward_spark_azure_storage_credentials. 85 00:04:14,690 --> 00:04:16,490 Man, what a mouthful. 86 00:04:16,490 --> 00:04:18,180 But this causes Azure Synapse 87 00:04:18,180 --> 00:04:22,200 to access data from Blob storage using an access key. 88 00:04:22,200 --> 00:04:24,000 And this is really the only supported method 89 00:04:24,000 --> 00:04:26,623 of authentication for this type of scenario. 90 00:04:29,150 --> 00:04:33,090 By way of review, Scala is a high-level programming language 91 00:04:33,090 --> 00:04:34,933 used within Azure Databricks. 92 00:04:36,000 --> 00:04:37,260 You can use Scala notebooks 93 00:04:37,260 --> 00:04:40,710 in your Azure Databricks workspace to carry out ETL, 94 00:04:40,710 --> 00:04:43,533 configuration, and analysis operations. 95 00:04:44,600 --> 00:04:47,870 And again, this is by no means even the tip of the iceberg 96 00:04:47,870 --> 00:04:49,730 with how to use Scala. 97 00:04:49,730 --> 00:04:52,570 I just want you to be able to recognize the general syntax 98 00:04:52,570 --> 00:04:55,420 and pattern so that you can identify it on site 99 00:04:55,420 --> 00:04:57,900 and know that it's Scala you're working with. 100 00:04:57,900 --> 00:05:00,390 I have placed a link in the lesson resources 101 00:05:00,390 --> 00:05:03,090 for further information about Scala data frames. 102 00:05:03,090 --> 00:05:04,980 So be sure to check that out. 103 00:05:04,980 --> 00:05:07,760 There's no way we could cover every facet of Scala 104 00:05:07,760 --> 00:05:10,490 without creating an entire course dedicated to it. 105 00:05:10,490 --> 00:05:12,330 So I would suggest some outside reading 106 00:05:12,330 --> 00:05:15,820 until you have a basic familiarity with the syntax. 107 00:05:15,820 --> 00:05:16,960 Remember that for the exam, 108 00:05:16,960 --> 00:05:19,080 you don't have to be a Scala expert. 109 00:05:19,080 --> 00:05:20,620 You just have to be able to recognize 110 00:05:20,620 --> 00:05:22,240 the language you're working with 111 00:05:22,240 --> 00:05:24,340 and then make intelligent decisions 112 00:05:24,340 --> 00:05:27,710 about which options should go into certain portions 113 00:05:27,710 --> 00:05:29,510 of the statement. 114 00:05:29,510 --> 00:05:31,250 But that's it for this lesson. 115 00:05:31,250 --> 00:05:33,260 In the next video, we'll cover one last tool 116 00:05:33,260 --> 00:05:36,130 for you to place in your data engineering tool belt. 117 00:05:36,130 --> 00:05:38,030 When you're ready, I'll see you there.