1 00:00:00,110 --> 00:00:02,620 ‫Okay, so let's use the Athena service 2 00:00:02,620 --> 00:00:06,910 ‫to query our data in Amazon S3 using the SQL query language. 3 00:00:06,910 --> 00:00:10,310 ‫So we are getting into Athena, and, as we can see, 4 00:00:10,310 --> 00:00:12,430 ‫this is the new editor. 5 00:00:12,430 --> 00:00:14,070 ‫And, so, before you run your first query, 6 00:00:14,070 --> 00:00:17,570 ‫you need to set up a query result location in Amazon S3. 7 00:00:17,570 --> 00:00:21,150 ‫So let's click on View settings and then Manage. 8 00:00:21,150 --> 00:00:25,090 ‫And then, here, when to specify a location of query results. 9 00:00:25,090 --> 00:00:27,810 ‫So what I have to do is to go into Amazon S3, 10 00:00:27,810 --> 00:00:29,540 ‫create a bucket, and I'll call this one 11 00:00:29,540 --> 00:00:34,540 ‫stephane-demo-athena-eu-central-1, 12 00:00:35,460 --> 00:00:36,760 ‫and that should be unique. 13 00:00:38,430 --> 00:00:40,130 ‫And then we'll create this bucket. 14 00:00:41,330 --> 00:00:43,230 ‫Now the bucket is created, I can click on View details. 15 00:00:43,230 --> 00:00:45,740 ‫Here is my buckets, and I can just copy 16 00:00:45,740 --> 00:00:47,090 ‫the bucket name right here. 17 00:00:48,760 --> 00:00:53,760 ‫And I'm going to just enter s3:// and then my bucket name. 18 00:00:55,180 --> 00:00:57,350 ‫Just click on Save, and now my query result location 19 00:00:57,350 --> 00:01:00,210 ‫is going to be in this S3 buckets. 20 00:01:00,210 --> 00:01:02,010 ‫So, now that we have this done, 21 00:01:02,010 --> 00:01:04,810 ‫we can refresh this page, and we can see 22 00:01:04,810 --> 00:01:07,520 ‫that there are already some Data Sources, 23 00:01:07,520 --> 00:01:11,500 ‫the AwsDataCatalog, and some Databases we can use, okay. 24 00:01:11,500 --> 00:01:13,340 ‫But what I'm going to do is that I'm going to create 25 00:01:13,340 --> 00:01:14,952 ‫my own database for it to work 26 00:01:14,952 --> 00:01:17,440 ‫because you may not necessarily have this. 27 00:01:17,440 --> 00:01:19,100 ‫So, for this, it's pretty easy. 28 00:01:19,100 --> 00:01:23,260 ‫We're going to create our S3 access logs in Amazon S3. 29 00:01:23,260 --> 00:01:24,820 ‫So I've made my bucket called 30 00:01:24,820 --> 00:01:27,610 ‫demo-s3-access-logs-stephane-2020, 31 00:01:27,610 --> 00:01:31,770 ‫which contains my access logs in this S3 logs folder. 32 00:01:31,770 --> 00:01:34,130 ‫So what I have to do now is to actually create a database, 33 00:01:34,130 --> 00:01:37,320 ‫and create this table that will present the objects in here. 34 00:01:37,320 --> 00:01:38,550 ‫So, how to do so? 35 00:01:38,550 --> 00:01:40,470 ‫Well, let's go into our code. 36 00:01:40,470 --> 00:01:42,390 ‫And the first line we have under S3 events, 37 00:01:42,390 --> 00:01:45,980 ‫athena-s3-access-logs-sql, is to create this database. 38 00:01:45,980 --> 00:01:47,720 ‫So we'll paste this in, 39 00:01:47,720 --> 00:01:50,233 ‫create database, and then click on Run. 40 00:01:51,210 --> 00:01:53,730 ‫And this creates a database successfully, 41 00:01:53,730 --> 00:01:56,180 ‫so now we have a specific database, here, 42 00:01:56,180 --> 00:01:57,393 ‫called s3_access_logs_db. 43 00:01:58,320 --> 00:02:01,040 ‫Next, we need to create an external table 44 00:02:01,040 --> 00:02:03,340 ‫that represents the bucket logs. 45 00:02:03,340 --> 00:02:06,530 ‫So let me paste, copy and paste this. 46 00:02:06,530 --> 00:02:08,730 ‫And the question is, how did I get this? 47 00:02:08,730 --> 00:02:13,730 ‫Well, if you type athena s3 access logs, on the internet, 48 00:02:14,730 --> 00:02:18,840 ‫you will have a link, and this link will actually give you 49 00:02:18,840 --> 00:02:23,840 ‫the query units you do to create these proper access logs. 50 00:02:24,010 --> 00:02:26,880 ‫So the one thing I have to change is under the location, 51 00:02:26,880 --> 00:02:30,060 ‫when to change a target bucket name and the prefix. 52 00:02:30,060 --> 00:02:33,260 ‫Well, it turns out that my properties will give me this 53 00:02:33,260 --> 00:02:34,860 ‫so I'll copy this S3 URI, 54 00:02:34,860 --> 00:02:37,500 ‫which has the bucket name and the prefix. 55 00:02:37,500 --> 00:02:41,610 ‫So I will paste this in right here, and we're good to go. 56 00:02:41,610 --> 00:02:45,570 ‫Now let's run this, and now the query is successful. 57 00:02:45,570 --> 00:02:48,105 ‫So, now, under this database, we see there's a table called 58 00:02:48,105 --> 00:02:51,290 ‫mybuckets_logs, with a bunch of columns. 59 00:02:51,290 --> 00:02:55,220 ‫So if I click on this, and click on Preview Table, 60 00:02:55,220 --> 00:02:57,160 ‫now this is going to run a select star 61 00:02:57,160 --> 00:02:59,500 ‫from my table, limit 10. 62 00:02:59,500 --> 00:03:03,750 ‫And, at the bottom, we see that it returns 10 rows, okay, 63 00:03:03,750 --> 00:03:06,700 ‫and these 10 rows represents my access logs, 64 00:03:06,700 --> 00:03:07,880 ‫so this is super handy. 65 00:03:07,880 --> 00:03:11,170 ‫Now I get all my access logs data in this table, 66 00:03:11,170 --> 00:03:12,450 ‫super, super handy. 67 00:03:12,450 --> 00:03:15,110 ‫So this is just scratching the surface because now, 68 00:03:15,110 --> 00:03:16,990 ‫using Athena, we can run some more 69 00:03:16,990 --> 00:03:20,090 ‫interesting types of queries. 70 00:03:20,090 --> 00:03:21,980 ‫So, as we can see right now, we have done nothing. 71 00:03:21,980 --> 00:03:24,170 ‫We didn't set up any database, per se, 72 00:03:24,170 --> 00:03:26,410 ‫we didn't provision database, this is all serverless okay. 73 00:03:26,410 --> 00:03:28,530 ‫All these queries are run by the Athena service 74 00:03:28,530 --> 00:03:31,630 ‫directly on the data that sits within our S3 buckets, 75 00:03:31,630 --> 00:03:33,990 ‫which is, I think, extremely powerful. 76 00:03:33,990 --> 00:03:36,390 ‫So, in this query, we're going to say 77 00:03:36,390 --> 00:03:38,450 ‫we want to get the requesturi_operation, 78 00:03:38,450 --> 00:03:41,050 ‫the httpstatus, count star, to count how many, 79 00:03:41,050 --> 00:03:43,010 ‫FROM this table, and we're going to do a GROUP BY. 80 00:03:43,010 --> 00:03:44,920 ‫So this is a more advanced SQL query, 81 00:03:44,920 --> 00:03:47,580 ‫but it's gonna give us a log of reports 82 00:03:47,580 --> 00:03:50,360 ‫of the status codes as well as how many times they occurred. 83 00:03:50,360 --> 00:03:53,510 ‫So, we can see, for example, that a GET 404 84 00:03:53,510 --> 00:03:58,510 ‫happened 49 times, and a HEAD 200 happened 81 time, 85 00:03:58,540 --> 00:04:01,290 ‫and so on, or GET 403 happened 14 times. 86 00:04:01,290 --> 00:04:03,360 ‫So, this could help us, for example, 87 00:04:03,360 --> 00:04:07,160 ‫do a report into if visioning what types of queries, 88 00:04:07,160 --> 00:04:09,540 ‫or what types of requests do not happen, 89 00:04:09,540 --> 00:04:13,310 ‫or do not succeed, on our buckets, so super handy. 90 00:04:13,310 --> 00:04:14,780 ‫And one last, for example, 91 00:04:14,780 --> 00:04:17,260 ‫we saw that there were a few, four or three errors, 92 00:04:17,260 --> 00:04:19,610 ‫so it can say, hey, this is when it's unauthorized, 93 00:04:19,610 --> 00:04:21,760 ‫this is very fishy, either someone is not authorized 94 00:04:21,760 --> 00:04:24,530 ‫to do something, and, cool, we want to know who and why. 95 00:04:24,530 --> 00:04:27,060 ‫Or if someone is trying to get unauthorized access 96 00:04:27,060 --> 00:04:29,980 ‫to our buckets and we wanted to just deep dive into it. 97 00:04:29,980 --> 00:04:33,630 ‫So, again, we run this query and then we have 44 rows, 98 00:04:33,630 --> 00:04:36,320 ‫which are not present here because the data is probably 99 00:04:36,320 --> 00:04:39,200 ‫very, very big, but we have 44 rows, okay, 100 00:04:39,200 --> 00:04:42,057 ‫and these rows will represent what type of query 101 00:04:42,057 --> 00:04:46,640 ‫generated and ended up in a 403, which is super handy. 102 00:04:46,640 --> 00:04:49,590 ‫So, again, the whole power, here, of Athena 103 00:04:49,590 --> 00:04:52,300 ‫is that we're able to literally query data on Amazon S3, 104 00:04:52,300 --> 00:04:53,530 ‫do some complex queries directly, 105 00:04:53,530 --> 00:04:55,130 ‫without setting up any servers, 106 00:04:55,130 --> 00:04:56,560 ‫without transforming our data, 107 00:04:56,560 --> 00:04:58,460 ‫just by setting up the right data formats, 108 00:04:58,460 --> 00:05:00,250 ‫and specifying what the data is, 109 00:05:00,250 --> 00:05:02,930 ‫Athena is able to help us with all these things, okay. 110 00:05:02,930 --> 00:05:05,144 ‫Finally, in Athena, you can see the recent queries 111 00:05:05,144 --> 00:05:07,690 ‫as well as saved queries, if you wanted to, 112 00:05:07,690 --> 00:05:09,530 ‫and edit the settings if you wanted to encrypt 113 00:05:09,530 --> 00:05:12,040 ‫the query results in the target buckets. 114 00:05:12,040 --> 00:05:14,120 ‫So that's it for this lecture, I hope you liked it, 115 00:05:14,120 --> 00:05:16,070 ‫and I will see you in the next lecture.