1 00:00:00,290 --> 00:00:01,340 ‫So now let's talk about 2 00:00:01,340 --> 00:00:02,750 ‫error handling in step functions 3 00:00:02,750 --> 00:00:06,010 ‫and really define the whole power of step functions. 4 00:00:06,010 --> 00:00:09,800 ‫So a step function will execute many small tasks 5 00:00:09,800 --> 00:00:12,730 ‫and these tasks will do very little amount of work. 6 00:00:12,730 --> 00:00:15,990 ‫For example, talk to an API and so on. 7 00:00:15,990 --> 00:00:19,850 ‫But all the error handling should happen outside 8 00:00:19,850 --> 00:00:23,050 ‫of these tasks by step functions themselves. 9 00:00:23,050 --> 00:00:25,690 ‫And so when can we have errors for example? 10 00:00:25,690 --> 00:00:27,850 ‫Well, if the state machine has a definition issue, 11 00:00:27,850 --> 00:00:30,410 ‫for example, there is no matching rule in the choice states, 12 00:00:30,410 --> 00:00:32,380 ‫or if a task fails, for example, 13 00:00:32,380 --> 00:00:34,970 ‫a Lambda function throws an exception, 14 00:00:34,970 --> 00:00:37,440 ‫we should not catch it in the Lambda function itself, 15 00:00:37,440 --> 00:00:39,720 ‫we should catch it in the step function 16 00:00:39,720 --> 00:00:41,410 ‫error handling mechanism. 17 00:00:41,410 --> 00:00:42,930 ‫Or a transient failure. 18 00:00:42,930 --> 00:00:45,270 ‫For example, there's a network partition events. 19 00:00:45,270 --> 00:00:47,710 ‫So we can have two types of error handling 20 00:00:47,710 --> 00:00:51,170 ‫in step functions, a Retry to retry a task, 21 00:00:51,170 --> 00:00:54,710 ‫or a Catch, to transition into a failure path. 22 00:00:54,710 --> 00:00:57,100 ‫And you should do this in the state machine 23 00:00:57,100 --> 00:00:59,270 ‫instead of the application code. 24 00:00:59,270 --> 00:01:01,910 ‫Because you make the application way simpler 25 00:01:01,910 --> 00:01:04,630 ‫and you have all very nice mechanisms 26 00:01:04,630 --> 00:01:07,850 ‫as well as execution history of these retries 27 00:01:07,850 --> 00:01:11,560 ‫and these catch directly in the step function history. 28 00:01:11,560 --> 00:01:13,660 ‫So we have predefined error codes. 29 00:01:13,660 --> 00:01:17,810 ‫So States.All, too much any error name or State.Timeout, 30 00:01:17,810 --> 00:01:20,680 ‫if the task ran longer than the timeout seconds 31 00:01:20,680 --> 00:01:23,550 ‫or there's no heartbeat received from an activity. 32 00:01:23,550 --> 00:01:26,580 ‫Or TaskFailed in case of an execution failure 33 00:01:26,580 --> 00:01:27,500 ‫of the task itself. 34 00:01:27,500 --> 00:01:31,660 ‫For example, as I said, an exception in a Lambda function. 35 00:01:31,660 --> 00:01:33,860 ‫Or States.Permissions, because there is not 36 00:01:33,860 --> 00:01:36,820 ‫enough permissions to execute some code. 37 00:01:36,820 --> 00:01:39,480 ‫So the state itself also may report its own errors 38 00:01:39,480 --> 00:01:42,250 ‫and you can catch them in the step functions. 39 00:01:42,250 --> 00:01:46,170 ‫So let's talk about the Retry for tasks or parallel states. 40 00:01:46,170 --> 00:01:49,440 ‫So the Retry allows you to define what happens 41 00:01:49,440 --> 00:01:53,190 ‫and how many times to retry based on some errors. 42 00:01:53,190 --> 00:01:55,570 ‫So this gets evaluated from top to bottom. 43 00:01:55,570 --> 00:01:58,280 ‫And so we have here in this example 44 00:01:58,280 --> 00:02:02,240 ‫a Lambda function being run, and we have one, two and three 45 00:02:02,240 --> 00:02:04,600 ‫retry that are happening. 46 00:02:04,600 --> 00:02:08,130 ‫So the ErrorEquals match a specific kind of error. 47 00:02:08,130 --> 00:02:10,831 ‫So here ErrorEquals Customerror, 48 00:02:10,831 --> 00:02:13,528 ‫in case the Lambda function throws a customer error, 49 00:02:13,528 --> 00:02:15,290 ‫or ErrorEquals TaskFailed, 50 00:02:15,290 --> 00:02:17,010 ‫in case the Lambda function just fails 51 00:02:17,010 --> 00:02:20,380 ‫but without throwing the custom error, 52 00:02:20,380 --> 00:02:24,290 ‫or All to try to catch all kinds of errors 53 00:02:24,290 --> 00:02:27,000 ‫that can happen within that Lambda function. 54 00:02:27,000 --> 00:02:31,130 ‫So interval seconds say, how long to wait before retrying? 55 00:02:31,130 --> 00:02:34,870 ‫So in this example, one, so wait one second and retry it 56 00:02:34,870 --> 00:02:38,330 ‫or this one 30 seconds, so wait 30 seconds and then retry it 57 00:02:38,330 --> 00:02:40,400 ‫or this one, five seconds. 58 00:02:40,400 --> 00:02:42,310 ‫Then we have the BackoffRate, which is 59 00:02:42,310 --> 00:02:45,710 ‫how many times you want to multiply the delay after retry. 60 00:02:45,710 --> 00:02:50,210 ‫So to implement exponential backoff, so two, two, and two, 61 00:02:50,210 --> 00:02:52,270 ‫in this example, they're all the same. 62 00:02:52,270 --> 00:02:55,220 ‫Then we have MaxAttempts, so how many times we should retry. 63 00:02:55,220 --> 00:02:57,210 ‫So by default, it's three and zero, 64 00:02:57,210 --> 00:02:58,670 ‫if you should never retry it, 65 00:02:58,670 --> 00:03:00,855 ‫but in this example, we have MaxAttempts two, 66 00:03:00,855 --> 00:03:03,730 ‫MaxAttempts two and MaxAttempts five. 67 00:03:03,730 --> 00:03:07,760 ‫And then whenever all the attempts are fulfilled and reach, 68 00:03:07,760 --> 00:03:10,600 ‫then the Catch block will kick in. 69 00:03:10,600 --> 00:03:12,950 ‫So in this example, as you can see, 70 00:03:12,950 --> 00:03:16,470 ‫if you were to define all the Retry logic 71 00:03:16,470 --> 00:03:18,640 ‫from within the Lambda function, 72 00:03:18,640 --> 00:03:19,830 ‫we would make the Lambda function 73 00:03:19,830 --> 00:03:21,930 ‫run for a very, very long time, obviously, 74 00:03:21,930 --> 00:03:23,450 ‫and maybe would timeout. 75 00:03:23,450 --> 00:03:26,880 ‫But also if you wanted to change your error handling logic 76 00:03:26,880 --> 00:03:29,140 ‫you would have to redeploy Lambda function. 77 00:03:29,140 --> 00:03:32,290 ‫But in this example, we are defining the retries 78 00:03:32,290 --> 00:03:35,437 ‫from outside the Lambda function, into our step functions. 79 00:03:35,437 --> 00:03:37,880 ‫And so therefore, we could change the Retry 80 00:03:37,880 --> 00:03:40,190 ‫and the error handling logic directly 81 00:03:40,190 --> 00:03:42,420 ‫from within our JSON documents 82 00:03:42,420 --> 00:03:44,480 ‫and we have a lot more flexibility. 83 00:03:44,480 --> 00:03:45,970 ‫And the Lambda function itself 84 00:03:45,970 --> 00:03:49,423 ‫executes with a small amount of time and very quickly. 85 00:03:50,300 --> 00:03:53,180 ‫So next, if we exhaust all the retries 86 00:03:53,180 --> 00:03:54,720 ‫then we go into a Catch. 87 00:03:54,720 --> 00:03:57,020 ‫And the Catch has a similar logic , 88 00:03:57,020 --> 00:03:59,100 ‫it's evaluated to top to bottom 89 00:03:59,100 --> 00:04:02,740 ‫and you have ErrorEquals and Next, so let's have a look. 90 00:04:02,740 --> 00:04:06,800 ‫So in this example, we have this Lambda function 91 00:04:06,800 --> 00:04:09,320 ‫and we're saying, okay, in case you find 92 00:04:09,320 --> 00:04:12,160 ‫the ErrorEquals CustomError, 93 00:04:12,160 --> 00:04:16,830 ‫then Next go to this state called CustomERrorFallback, 94 00:04:16,830 --> 00:04:20,360 ‫which corresponds to the block right here. 95 00:04:20,360 --> 00:04:22,950 ‫So we're saying, "Hey, if you encounter a custom error, 96 00:04:22,950 --> 00:04:26,740 ‫please now go into the states," and the state is pass 97 00:04:26,740 --> 00:04:31,060 ‫and maybe there is an exception and the end is true. 98 00:04:31,060 --> 00:04:35,100 ‫So really the idea is here you can say retry, retry, retry, 99 00:04:35,100 --> 00:04:37,570 ‫and in case of too many retries, then catch this error 100 00:04:37,570 --> 00:04:40,180 ‫and go into the next state and do something about it. 101 00:04:40,180 --> 00:04:43,710 ‫And the ResultPath is a path that determines 102 00:04:43,710 --> 00:04:46,120 ‫what input is sent to the state specified 103 00:04:46,120 --> 00:04:46,953 ‫in the next field. 104 00:04:46,953 --> 00:04:50,870 ‫And we'll have a deep dive on it in the very next slide. 105 00:04:50,870 --> 00:04:53,110 ‫So the idea here really is to say, 106 00:04:53,110 --> 00:04:55,190 ‫in case there are too many execution failures 107 00:04:55,190 --> 00:04:58,440 ‫or some specific kind of errors, then we want to catch them 108 00:04:58,440 --> 00:04:59,670 ‫and move on with our state machine. 109 00:04:59,670 --> 00:05:02,780 ‫And again, if this is something we define within our code 110 00:05:02,780 --> 00:05:05,730 ‫our code may become very, very complicated. 111 00:05:05,730 --> 00:05:07,310 ‫But if this is something that we define 112 00:05:07,310 --> 00:05:10,190 ‫within our step functions, we have a lot of flexibilities 113 00:05:10,190 --> 00:05:12,823 ‫into how we want to handle these errors. 114 00:05:14,240 --> 00:05:17,730 ‫So let's do an analysis of how this ResultPath works 115 00:05:17,730 --> 00:05:18,960 ‫because it is important to understand 116 00:05:18,960 --> 00:05:20,480 ‫that going into the exam. 117 00:05:20,480 --> 00:05:23,230 ‫So let's say we have this task and there's one catch, 118 00:05:23,230 --> 00:05:25,560 ‫and the catch is saying, go to the next task 119 00:05:25,560 --> 00:05:30,550 ‫for any kind of errors and the ResultPath, $.error. 120 00:05:30,550 --> 00:05:34,030 ‫So the $.error allows you to include the error 121 00:05:34,030 --> 00:05:35,270 ‫in the inputs. 122 00:05:35,270 --> 00:05:37,970 ‫So let's say for example that we have this input 123 00:05:37,970 --> 00:05:41,330 ‫being foo bar, thanks to the ResultPath 124 00:05:41,330 --> 00:05:43,980 ‫with define, which is $.error, 125 00:05:43,980 --> 00:05:48,590 ‫the output is going to contain the foo bar as before, 126 00:05:48,590 --> 00:05:51,050 ‫but also an error block 127 00:05:51,050 --> 00:05:53,800 ‫which contains the error message itself 128 00:05:53,800 --> 00:05:55,880 ‫and some information about the error. 129 00:05:55,880 --> 00:05:58,160 ‫That means that when this output is passed 130 00:05:58,160 --> 00:06:00,610 ‫on to the next task into the next state, 131 00:06:00,610 --> 00:06:04,460 ‫we can, for example, analyze send some emails and so on 132 00:06:04,460 --> 00:06:05,870 ‫and debug based on it. 133 00:06:05,870 --> 00:06:08,670 ‫So the ResultPath is how you pass the errors 134 00:06:08,670 --> 00:06:12,390 ‫from the inputs into the output, into the next task. 135 00:06:12,390 --> 00:06:14,740 ‫And this is something the exam can ask you on. 136 00:06:14,740 --> 00:06:16,360 ‫So that's it for step functions. 137 00:06:16,360 --> 00:06:17,193 ‫I hope you liked it. 138 00:06:17,193 --> 00:06:20,200 ‫And again, let's go into a hands-on to really understand 139 00:06:20,200 --> 00:06:23,400 ‫how to retries and catches work in step functions. 140 00:06:23,400 --> 00:06:25,300 ‫So I will see you in the next lecture.