1 00:00:00,170 --> 00:00:02,020 Moving right along through this section, 2 00:00:02,020 --> 00:00:03,000 in this lesson. 3 00:00:03,000 --> 00:00:06,000 We are going to talk about monitoring and updating 4 00:00:06,000 --> 00:00:08,720 statistics about data across the system. 5 00:00:08,720 --> 00:00:10,060 To start off with, 6 00:00:10,060 --> 00:00:12,540 we're going to talk about what that actually means. 7 00:00:12,540 --> 00:00:14,280 So we're going to take a look at that requirement, 8 00:00:14,280 --> 00:00:15,890 talk through what that means 9 00:00:15,890 --> 00:00:17,300 and we're going to use Synapse SQL 10 00:00:17,300 --> 00:00:19,920 as we move through this lesson. 11 00:00:19,920 --> 00:00:22,700 We're also going to about auto-create statistics, 12 00:00:22,700 --> 00:00:25,620 what it is and how you use it. 13 00:00:25,620 --> 00:00:27,930 Then we'll follow up by talking about your statistics 14 00:00:27,930 --> 00:00:30,940 and how to know whether or not they are up to date, 15 00:00:30,940 --> 00:00:32,960 but to get started let's jump in 16 00:00:32,960 --> 00:00:35,750 and talk about what this actually means. 17 00:00:35,750 --> 00:00:39,500 So what are statistics in Synapse SQL? 18 00:00:39,500 --> 00:00:42,670 Well, as we look at your dedicated SQL pools, 19 00:00:42,670 --> 00:00:46,530 the more that that SQL pool knows about your data 20 00:00:46,530 --> 00:00:48,770 the faster it can execute queries. 21 00:00:49,640 --> 00:00:51,660 And so what we're essentially doing, 22 00:00:51,660 --> 00:00:53,560 is we are just collecting statistics 23 00:00:53,560 --> 00:00:56,100 or information about your data, 24 00:00:56,100 --> 00:00:59,070 and leveraging a SQL pool query optimizer 25 00:00:59,070 --> 00:01:02,910 to take a look at the best way to present those queries. 26 00:01:02,910 --> 00:01:05,530 So when we look at statistics in Synapse SQL 27 00:01:05,530 --> 00:01:08,380 first, they're vital to query performance 28 00:01:08,380 --> 00:01:09,930 and then following on that 29 00:01:09,930 --> 00:01:12,940 having those statistics allow us to avoid guesses 30 00:01:12,940 --> 00:01:15,810 and lead to better data distribution. 31 00:01:15,810 --> 00:01:17,390 So why is that important? 32 00:01:17,390 --> 00:01:20,870 Well, it comes down to cost-based analysis. 33 00:01:20,870 --> 00:01:23,700 So in that SQL pool query optimizer 34 00:01:23,700 --> 00:01:26,750 one of the things it looks at is cost. 35 00:01:26,750 --> 00:01:31,480 And so it is going to examine the best way to run your query 36 00:01:31,480 --> 00:01:35,760 so that it saves the most amount of money for you. 37 00:01:35,760 --> 00:01:39,830 And with that, it's going to analyze several different plans 38 00:01:39,830 --> 00:01:40,760 or various plans, 39 00:01:40,760 --> 00:01:42,290 depending upon the query that you're running 40 00:01:42,290 --> 00:01:44,300 and choose the best one. 41 00:01:44,300 --> 00:01:47,750 Less processing time, more money in your pocket. 42 00:01:47,750 --> 00:01:48,590 All right. 43 00:01:48,590 --> 00:01:52,790 So how do we look at automatic creation of statistics? 44 00:01:52,790 --> 00:01:55,500 Is it automatically on, do we have to turn it on? 45 00:01:55,500 --> 00:01:58,295 Well, in this case, you do have to turn it on. 46 00:01:58,295 --> 00:02:01,210 And so the first question you need to ask yourself is 47 00:02:01,210 --> 00:02:05,180 is it on? So you can jump into SQL pool 48 00:02:05,180 --> 00:02:07,810 and you can run this query, 49 00:02:07,810 --> 00:02:12,810 SELECT name, is_auto_create_stats_on FROM sys.databases 50 00:02:13,050 --> 00:02:15,180 and what that'll do is that'll actually spit out 51 00:02:15,180 --> 00:02:17,000 and tell you whether it's on. 52 00:02:17,000 --> 00:02:21,270 If it's not on, you're going to need to alter the database. 53 00:02:21,270 --> 00:02:24,480 So alter the database and then your data warehouse name 54 00:02:24,480 --> 00:02:27,080 that's where the, your stuff here is. 55 00:02:27,080 --> 00:02:31,410 Then you're going to set auto-create statistics to on. 56 00:02:31,410 --> 00:02:34,520 Once you've done that, any statement in this list. 57 00:02:34,520 --> 00:02:38,150 So select, insert-select, create table as select, 58 00:02:38,150 --> 00:02:39,410 update, delete, explain. 59 00:02:39,410 --> 00:02:41,560 Any of those statements, 60 00:02:41,560 --> 00:02:45,420 will automatically trigger the creation of statistics 61 00:02:45,420 --> 00:02:46,593 in your data. 62 00:02:50,160 --> 00:02:52,330 Don't forget to update. 63 00:02:52,330 --> 00:02:56,450 So the best practice is to update your statistics 64 00:02:56,450 --> 00:03:00,660 on date columns each day as new dates are added. 65 00:03:00,660 --> 00:03:01,970 So here's the deal. 66 00:03:01,970 --> 00:03:05,960 As you update and add data to your database 67 00:03:05,960 --> 00:03:08,590 if you are not updating the statistics, 68 00:03:08,590 --> 00:03:12,240 your data distribution is going to get out of whack 69 00:03:12,240 --> 00:03:16,250 and that could potentially lead to your query optimizer 70 00:03:16,250 --> 00:03:20,310 not actually optimizing your data the way that it should. 71 00:03:20,310 --> 00:03:23,890 So it's important that you update your statistics 72 00:03:23,890 --> 00:03:24,870 as you move through 73 00:03:24,870 --> 00:03:27,320 and that you have some sort of scheme to do that. 74 00:03:28,690 --> 00:03:30,440 So how do we do this? 75 00:03:30,440 --> 00:03:32,960 Well, it's not extremely simple. 76 00:03:32,960 --> 00:03:35,560 There's quite a few different ways that you can do it. 77 00:03:35,560 --> 00:03:39,020 The good news is each way is actually pretty simple. 78 00:03:39,020 --> 00:03:41,540 So let's take a look at one of them as an example. 79 00:03:41,540 --> 00:03:43,610 So we could use create statistics. 80 00:03:43,610 --> 00:03:46,930 We could define our column and define our table, 81 00:03:46,930 --> 00:03:48,380 and then once we've done that, 82 00:03:48,380 --> 00:03:50,040 we could just simply say in this case, 83 00:03:50,040 --> 00:03:53,570 we're looking at a range of values. 84 00:03:53,570 --> 00:03:58,360 So we're looking at updating statistics on rows 43 85 00:03:58,360 --> 00:04:00,870 all the way up to 8,675,308. 86 00:04:04,040 --> 00:04:06,670 So that's what we're looking at doing in this case 87 00:04:06,670 --> 00:04:10,070 and there's actually a Microsoft doc 88 00:04:10,070 --> 00:04:13,020 that is a fantastic resource for more examples 89 00:04:13,020 --> 00:04:16,310 and I'm going to throw that in the link for this lesson. 90 00:04:16,310 --> 00:04:18,580 So take a look at that if you want to. 91 00:04:18,580 --> 00:04:22,140 I don't think that this is likely to be on the DP-203, 92 00:04:22,140 --> 00:04:23,160 of course, I don't know, 93 00:04:23,160 --> 00:04:25,520 but I would lean towards specific examples 94 00:04:25,520 --> 00:04:27,030 are not going to be, 95 00:04:27,030 --> 00:04:30,200 so I would not go out and memorize all of the examples. 96 00:04:30,200 --> 00:04:32,770 Really, more important for the DP-203 97 00:04:32,770 --> 00:04:36,940 is understanding what is actually happening here. 98 00:04:36,940 --> 00:04:39,520 So if we were going to do update statistics, 99 00:04:39,520 --> 00:04:40,390 how do we do that? 100 00:04:40,390 --> 00:04:42,110 What service are we talking about? 101 00:04:42,110 --> 00:04:45,540 What kinds of commands could you look at? 102 00:04:45,540 --> 00:04:48,800 So something that involves CREATE STATISTICS, right? 103 00:04:48,800 --> 00:04:51,173 That's the kind of thing that we would be looking at. 104 00:04:52,100 --> 00:04:53,780 Key points to remember. 105 00:04:53,780 --> 00:04:55,380 Don't forget to turn it on. 106 00:04:55,380 --> 00:04:57,950 Yes, you do have to turn on statistics 107 00:04:57,950 --> 00:05:01,093 in order for them to actually work in your SQL pools. 108 00:05:02,300 --> 00:05:05,980 Next, know what triggers the creation of statistics. 109 00:05:05,980 --> 00:05:09,940 So we talked about the six different statements 110 00:05:09,940 --> 00:05:12,110 that we have so select insert, 111 00:05:12,110 --> 00:05:13,920 select create table is select, 112 00:05:13,920 --> 00:05:16,030 so basically anything that involves select 113 00:05:16,030 --> 00:05:18,050 and then update, delete and explain 114 00:05:18,050 --> 00:05:19,680 are automatically going to trigger 115 00:05:19,680 --> 00:05:21,500 the creation of your statistics. 116 00:05:21,500 --> 00:05:23,740 Again, assuming you've actually turned them on. 117 00:05:23,740 --> 00:05:25,860 If you've got those 2 points down 118 00:05:25,860 --> 00:05:29,380 you are good to go onto the next lesson. 119 00:05:29,380 --> 00:05:30,880 All right, I'll see you there.