1 00:00:01,050 --> 00:00:03,290 All right, welcome back. 2 00:00:03,290 --> 00:00:06,210 So in this lesson, we are going to dive deep 3 00:00:06,210 --> 00:00:10,110 into windows in Azure Stream Analytics. 4 00:00:10,110 --> 00:00:11,930 I'm going to spend some time talking to you 5 00:00:11,930 --> 00:00:15,330 about what a window is and why you need it. 6 00:00:15,330 --> 00:00:17,380 We're going to dive into the 5 types 7 00:00:17,380 --> 00:00:19,640 of windows and how they work. 8 00:00:19,640 --> 00:00:21,990 And then finally, to give you a conceptual look 9 00:00:21,990 --> 00:00:24,963 at windows and the code that's used to employ them. 10 00:00:26,100 --> 00:00:28,300 So with that, let's start talking about 11 00:00:28,300 --> 00:00:30,580 what a window is and why we need it. 12 00:00:30,580 --> 00:00:33,940 So as we talk about windows, keep in mind that windows 13 00:00:33,940 --> 00:00:37,990 allow functions to be run on sets of streaming data. 14 00:00:37,990 --> 00:00:40,940 That's a more technical way of explaining what I've been 15 00:00:40,940 --> 00:00:43,730 talking about in the last couple of lessons. 16 00:00:43,730 --> 00:00:45,810 Essentially, it allows us to get a time, 17 00:00:45,810 --> 00:00:47,390 start at an end time, 18 00:00:47,390 --> 00:00:50,270 and based upon that, we can do a function, 19 00:00:50,270 --> 00:00:54,520 like a summation, or an average, or whatever we want. 20 00:00:54,520 --> 00:00:59,490 All of our outputs always result at the end of our window. 21 00:00:59,490 --> 00:01:02,900 So a good example of a window and why we need it 22 00:01:02,900 --> 00:01:04,320 would be a call center. 23 00:01:04,320 --> 00:01:06,570 So for instance, if we had a call center 24 00:01:06,570 --> 00:01:09,000 and we have a bunch of different people calling in, 25 00:01:09,000 --> 00:01:11,870 we could take all of that data and look at it 26 00:01:11,870 --> 00:01:14,620 over the period of an hour that's streaming through 27 00:01:14,620 --> 00:01:18,360 and we could run streaming processes on that data. 28 00:01:18,360 --> 00:01:21,040 Or maybe it's every 15 minutes, 29 00:01:21,040 --> 00:01:23,750 and we're taking a low look at average call length, 30 00:01:23,750 --> 00:01:28,050 for example. That would be a good scenario for a window. 31 00:01:28,050 --> 00:01:32,100 And with that, we can create some highly complex solutions. 32 00:01:32,100 --> 00:01:34,620 So you can see here, down at the bottom in green, 33 00:01:34,620 --> 00:01:36,630 we have our event sources. 34 00:01:36,630 --> 00:01:38,210 So we talked about inputs 35 00:01:38,210 --> 00:01:40,540 and we have multiple inputs coming in 36 00:01:40,540 --> 00:01:43,050 from IoT Hubs and Event Hubs. 37 00:01:43,050 --> 00:01:45,010 And then Azure Stream Analytics, 38 00:01:45,010 --> 00:01:48,120 or ASA, there at the top in purple 39 00:01:48,120 --> 00:01:49,970 is going to be taking that data, 40 00:01:49,970 --> 00:01:52,940 as well as some data from a SQL database 41 00:01:52,940 --> 00:01:55,400 and it's going to create a window 42 00:01:55,400 --> 00:01:58,210 and run queries on all of that data. 43 00:01:58,210 --> 00:01:59,960 And then process that through in 44 00:01:59,960 --> 00:02:03,290 what we would call a hot path to Cosmos DB. 45 00:02:03,290 --> 00:02:04,320 So, basically, from this 46 00:02:04,320 --> 00:02:08,320 we can create some very complex solutions. 47 00:02:08,320 --> 00:02:10,390 So now that we've talked a little bit 48 00:02:10,390 --> 00:02:12,800 about what windows are, let's talk about 49 00:02:12,800 --> 00:02:15,820 the types of windows that we can see. 50 00:02:15,820 --> 00:02:17,710 There's 5 types of windows 51 00:02:17,710 --> 00:02:20,970 and, yes, you need to know all of them. 52 00:02:20,970 --> 00:02:22,660 And we are specifically focusing 53 00:02:22,660 --> 00:02:25,740 on Azure Stream Analytics in this lesson. 54 00:02:25,740 --> 00:02:28,720 There are other services in Azure that use windows 55 00:02:28,720 --> 00:02:30,180 but keep in mind that right now 56 00:02:30,180 --> 00:02:32,700 we're talking about Stream Analytics. 57 00:02:32,700 --> 00:02:36,530 So the 5 types of windows are tumbling, hopping, 58 00:02:36,530 --> 00:02:40,710 sliding, session, and snapshot. 59 00:02:40,710 --> 00:02:44,080 Now with this, Microsoft says that a window 60 00:02:44,080 --> 00:02:46,510 contains event data along a timeline 61 00:02:46,510 --> 00:02:49,200 and enables you to perform various operations 62 00:02:49,200 --> 00:02:52,270 against the events within that window. 63 00:02:52,270 --> 00:02:53,690 What I'm trying to do with this 64 00:02:53,690 --> 00:02:57,180 I'm trying to show you a couple of different ways to read 65 00:02:57,180 --> 00:03:00,600 and hear what a window is so that you can see we're talking 66 00:03:00,600 --> 00:03:02,190 about the same thing. 67 00:03:02,190 --> 00:03:04,010 Whether we're talking about functions, 68 00:03:04,010 --> 00:03:07,690 whether we're talking about just doing an average of time, 69 00:03:07,690 --> 00:03:11,270 whether we're talking about a timeline and the enablement 70 00:03:11,270 --> 00:03:15,713 of various operations, we're all talking about windows here. 71 00:03:17,000 --> 00:03:20,410 So with that, let's dive in and talk about tumbling windows. 72 00:03:20,410 --> 00:03:23,730 Tumbling windows are repeating windows 73 00:03:23,730 --> 00:03:28,210 that are non-overlapping and events cannot belong 74 00:03:28,210 --> 00:03:30,580 to more than one tumbling window. 75 00:03:30,580 --> 00:03:32,930 Now I know in a lot of places in this course, 76 00:03:32,930 --> 00:03:34,740 I say you don't need to memorize this 77 00:03:34,740 --> 00:03:36,560 you just need to understand it. 78 00:03:36,560 --> 00:03:38,930 I would suggest that with this section 79 00:03:38,930 --> 00:03:40,650 you actually memorize that. 80 00:03:40,650 --> 00:03:42,830 You should know that a tumbling window is repeating. 81 00:03:42,830 --> 00:03:45,660 You should know it's not overlapping, and you should know 82 00:03:45,660 --> 00:03:48,920 that events don't belong to more than one tumbling window. 83 00:03:48,920 --> 00:03:50,930 That's going to be important. 84 00:03:50,930 --> 00:03:54,120 So here is an example of a tumbling window. 85 00:03:54,120 --> 00:03:56,100 So we have our green circles here 86 00:03:56,100 --> 00:03:58,240 and the green circles are just seconds. 87 00:03:58,240 --> 00:04:01,840 So basically we're looking at 1 minute of time, 88 00:04:01,840 --> 00:04:04,280 starting from the left, which is 0, all the way up 89 00:04:04,280 --> 00:04:07,270 to our 60 seconds on the right. 90 00:04:07,270 --> 00:04:11,010 Now these little orange boxes, this 3, 5, 8, 6, 2, 91 00:04:11,010 --> 00:04:12,480 these are just events. 92 00:04:12,480 --> 00:04:14,650 We could have given them a number, a name, 93 00:04:14,650 --> 00:04:15,800 it doesn't really matter. 94 00:04:15,800 --> 00:04:17,380 This is just an event that's coming 95 00:04:17,380 --> 00:04:19,223 into our tumbling window. 96 00:04:20,500 --> 00:04:22,720 And down here at the bottom you can see 97 00:04:22,720 --> 00:04:25,540 that I have dash lines for this first tumbling window 98 00:04:25,540 --> 00:04:28,910 goes from 0 to 20 seconds. 99 00:04:28,910 --> 00:04:31,650 And so you can see it starts at the 0 and ends at the 20, 100 00:04:31,650 --> 00:04:34,860 so this is a tumbling window. 101 00:04:34,860 --> 00:04:38,460 And within that window, we have the events 35862 102 00:04:39,840 --> 00:04:43,780 that occur at various times within that 20 seconds. 103 00:04:43,780 --> 00:04:46,720 And so those events at the top are carried down 104 00:04:46,720 --> 00:04:50,483 to within that tumbling window or this blue background here. 105 00:04:51,400 --> 00:04:54,820 So as we finish our first tumbling window 106 00:04:54,820 --> 00:04:58,350 it's not overlapping, which means that window ends 107 00:04:58,350 --> 00:05:00,830 and it's repeating, which means it tumbles 108 00:05:00,830 --> 00:05:03,460 into our second space here, 109 00:05:03,460 --> 00:05:06,260 which runs from 20 to 40 seconds. 110 00:05:06,260 --> 00:05:09,730 And then from there, it goes from 40 to 60 seconds. 111 00:05:09,730 --> 00:05:13,900 And every event appears in one of those windows. 112 00:05:13,900 --> 00:05:17,140 So this is a tumbling window. 113 00:05:17,140 --> 00:05:19,200 And looking at the code for this, 114 00:05:19,200 --> 00:05:21,260 we're going to have our SELECT statement 115 00:05:21,260 --> 00:05:23,350 that we saw last lesson 116 00:05:23,350 --> 00:05:25,990 and then we're going to have our FROM statement. 117 00:05:25,990 --> 00:05:27,760 So where is it coming from? 118 00:05:27,760 --> 00:05:29,550 This is our input stream. 119 00:05:29,550 --> 00:05:32,590 And then we're going to have this GROUP BY statement. 120 00:05:32,590 --> 00:05:36,390 And in this case, we're going to group by our time zone 121 00:05:36,390 --> 00:05:38,350 and we're going to create a tumbling window 122 00:05:38,350 --> 00:05:41,520 with a duration of 20 seconds. 123 00:05:41,520 --> 00:05:44,520 And we're going to offset that by one millisecond 124 00:05:44,520 --> 00:05:46,300 or negative one millisecond. 125 00:05:46,300 --> 00:05:47,520 So you can create an offset 126 00:05:47,520 --> 00:05:50,110 and just kind of play with those windows moving forward 127 00:05:50,110 --> 00:05:52,220 or backward in time a little bit, 128 00:05:52,220 --> 00:05:54,880 depending upon what you're trying to do. 129 00:05:54,880 --> 00:05:58,733 But the key is tumbling window, duration 20 seconds. 130 00:06:00,150 --> 00:06:02,990 All right, so that is a tumbling window. 131 00:06:02,990 --> 00:06:06,150 Up next, we're going to talk about a hopping window. 132 00:06:06,150 --> 00:06:09,300 Now, a hopping window can hop forward in time 133 00:06:09,300 --> 00:06:11,230 by a fixed period. 134 00:06:11,230 --> 00:06:15,570 And hopping windows also can overlap, okay. 135 00:06:15,570 --> 00:06:18,900 So what we have here is we have an example 136 00:06:18,900 --> 00:06:23,900 of a 10-second hopping window with a 5-second hop, okay. 137 00:06:25,190 --> 00:06:29,320 So our very first window here starts at 0 seconds 138 00:06:29,320 --> 00:06:31,310 and it goes to 10 seconds. 139 00:06:31,310 --> 00:06:34,070 Again, those green circles are seconds. 140 00:06:34,070 --> 00:06:37,710 So from 0 to 10 seconds is our first window. 141 00:06:37,710 --> 00:06:39,850 So like our last window, you can see here 142 00:06:39,850 --> 00:06:43,560 that events 75943 143 00:06:43,560 --> 00:06:46,100 appear in that window. 144 00:06:46,100 --> 00:06:49,460 Now, unlike tumbling, windows can overlap. 145 00:06:49,460 --> 00:06:52,490 And so we're going to hop forward 5 seconds. 146 00:06:52,490 --> 00:06:57,490 So our next window actually begins at 5 seconds 147 00:06:57,510 --> 00:07:01,690 and it's going to go from 5 seconds to 15 seconds 148 00:07:01,690 --> 00:07:04,163 which is our 10 second hopping window. 149 00:07:05,140 --> 00:07:08,557 So window #2 here contains events 5943 150 00:07:12,405 --> 00:07:16,740 and then it captures the very end event number 2 over here. 151 00:07:16,740 --> 00:07:19,330 So we can have more than one event 152 00:07:19,330 --> 00:07:21,683 appear in more than one window. 153 00:07:22,540 --> 00:07:26,190 So that is our second hopping window. 154 00:07:26,190 --> 00:07:28,950 And then our third window kicks off 5 seconds 155 00:07:28,950 --> 00:07:30,690 from when the second one started. 156 00:07:30,690 --> 00:07:32,750 So it's going to start at 10 seconds 157 00:07:32,750 --> 00:07:35,420 and run all the way to 20 seconds. 158 00:07:35,420 --> 00:07:37,240 Now in this window the only events 159 00:07:37,240 --> 00:07:41,210 that we're going to capture is event 2 and event 6. 160 00:07:41,210 --> 00:07:44,483 So you can see here, this is a hopping window. 161 00:07:46,100 --> 00:07:47,820 And so then when we look at the code here 162 00:07:47,820 --> 00:07:51,310 you can see that we have our GROUP BY statement again, 163 00:07:51,310 --> 00:07:53,500 and I didn't include the rest of it for the SELECT 164 00:07:53,500 --> 00:07:55,770 and the FROM because we're just talking 165 00:07:55,770 --> 00:07:58,370 about the code for the window itself. 166 00:07:58,370 --> 00:08:00,460 So we have our group by statement. 167 00:08:00,460 --> 00:08:01,900 X is just topic. 168 00:08:01,900 --> 00:08:02,950 It could be whatever you want. 169 00:08:02,950 --> 00:08:06,650 It could be a timestamp, a topic, a whatever, 170 00:08:06,650 --> 00:08:08,720 how we're going to group our data. 171 00:08:08,720 --> 00:08:11,770 Then we're going to group it using a hopping window. 172 00:08:11,770 --> 00:08:14,670 And this has a duration of 20 minutes. 173 00:08:14,670 --> 00:08:18,200 So each window will be 20 minutes in length 174 00:08:18,200 --> 00:08:21,310 and it's going to hop 5 minutes. 175 00:08:21,310 --> 00:08:24,120 So if the first window starts at minute 0 176 00:08:24,120 --> 00:08:27,120 the second window will start at minute 5. 177 00:08:27,120 --> 00:08:30,010 And then again, we can also define an offset 178 00:08:30,010 --> 00:08:31,623 if we so choose. 179 00:08:32,740 --> 00:08:35,980 So as we wrap up our discussion on hopping windows, 180 00:08:35,980 --> 00:08:37,250 I'm actually going to take us back 181 00:08:37,250 --> 00:08:38,640 to the 5 types of windows, 182 00:08:38,640 --> 00:08:42,280 and I am going to end this first session, 183 00:08:42,280 --> 00:08:44,830 this first lesson on windows. 184 00:08:44,830 --> 00:08:48,210 The next couple of windows are going to be a very different 185 00:08:48,210 --> 00:08:50,460 kind of window than the tumbling and hopping window 186 00:08:50,460 --> 00:08:51,850 that we've just looked at. 187 00:08:51,850 --> 00:08:54,450 And so this makes kind of a natural break point. 188 00:08:54,450 --> 00:08:57,630 So take a breath, take a pause, grab a coffee 189 00:08:57,630 --> 00:09:00,700 and I'll see you in the second half, the next lesson, 190 00:09:00,700 --> 00:09:04,510 where we about sliding session and snapshot windows. 191 00:09:04,510 --> 00:09:05,343 See you then.