1 00:00:00,150 --> 00:00:03,180 Andrei: Let's talk about file I/O. 2 00:00:03,180 --> 00:00:06,120 What does I/O mean? 3 00:00:06,120 --> 00:00:08,700 You'll see this a lot in programming. 4 00:00:08,700 --> 00:00:11,373 I/O stands for input output. 5 00:00:12,300 --> 00:00:16,230 You see, most of the times, machines are not communicating 6 00:00:16,230 --> 00:00:18,600 in just one environment. 7 00:00:18,600 --> 00:00:22,650 For example, so far we've been writing our code in a, 8 00:00:22,650 --> 00:00:26,400 let's say, an editor like Sublime Text that I have here, 9 00:00:26,400 --> 00:00:30,210 or PyCharm or an online repo, 10 00:00:30,210 --> 00:00:32,100 but usually, you wanna interact 11 00:00:32,100 --> 00:00:35,070 with different parts of the system. 12 00:00:35,070 --> 00:00:38,070 Maybe you wanna speak to another website. 13 00:00:38,070 --> 00:00:41,040 Maybe you wanna speak to something 14 00:00:41,040 --> 00:00:43,440 that's on your desktop maybe a file. 15 00:00:43,440 --> 00:00:46,260 Maybe two different machines are communicating 16 00:00:46,260 --> 00:00:47,190 with each other. 17 00:00:47,190 --> 00:00:49,530 Maybe you're speaking to a database. 18 00:00:49,530 --> 00:00:51,660 I/O simply means, "Hey, 19 00:00:51,660 --> 00:00:55,890 I want you to input something from the outside world 20 00:00:55,890 --> 00:00:59,637 and output something into the outside world." 21 00:01:00,630 --> 00:01:03,720 And one of the most common ways that we use things 22 00:01:03,720 --> 00:01:06,513 like I/O is through reading files. 23 00:01:07,440 --> 00:01:10,350 You might think about Python and how wouldn't it be nice 24 00:01:10,350 --> 00:01:14,430 if we can perhaps write a script that compresses images. 25 00:01:14,430 --> 00:01:18,930 Well, we need I/O then, I need to input an image 26 00:01:18,930 --> 00:01:23,043 and then I need to output the compressed image. 27 00:01:24,000 --> 00:01:26,970 Maybe I wanna work with a PDF file 28 00:01:26,970 --> 00:01:30,390 and maybe add a watermark to all my PDF pages. 29 00:01:30,390 --> 00:01:35,390 Well, then we input a PDF and output a new version of a PDF. 30 00:01:35,790 --> 00:01:39,150 So this is a very common task that we see a lot of, 31 00:01:39,150 --> 00:01:43,380 and reading and writing files is a very important tool 32 00:01:43,380 --> 00:01:44,230 in our tool belt. 33 00:01:45,090 --> 00:01:47,070 And by the way, we have a project coming up 34 00:01:47,070 --> 00:01:50,520 where we actually are going to use this with PDFs. 35 00:01:50,520 --> 00:01:51,353 But for now, 36 00:01:51,353 --> 00:01:55,023 how can we do this file input output with Python? 37 00:01:56,490 --> 00:01:59,520 Well, Python has a built-in function 38 00:01:59,520 --> 00:02:03,933 that allows us to open and write to files, 39 00:02:05,130 --> 00:02:09,270 and it's simply called open, nice and easy. 40 00:02:09,270 --> 00:02:12,870 So, using open, we can do something like this. 41 00:02:12,870 --> 00:02:17,100 I can create a sample text file in our desktop, 42 00:02:17,100 --> 00:02:19,413 so I'm going to use my terminal here. 43 00:02:20,400 --> 00:02:22,500 So if we look at present working directory, 44 00:02:22,500 --> 00:02:23,490 I'm on a Desktop. 45 00:02:23,490 --> 00:02:24,630 If you're on Windows, 46 00:02:24,630 --> 00:02:27,870 then you might have to do some different commands in here, 47 00:02:27,870 --> 00:02:30,090 but by now you should be pretty familiar 48 00:02:30,090 --> 00:02:32,820 that you can create a file if you wanted to. 49 00:02:32,820 --> 00:02:35,790 Now, I can do this manually or in my terminal, 50 00:02:35,790 --> 00:02:37,680 I can actually do, if you're on a Mac 51 00:02:37,680 --> 00:02:41,253 you can just do touch, and then I'll say test.txt. 52 00:02:42,390 --> 00:02:44,850 So if I do ls here, you see 53 00:02:44,850 --> 00:02:49,623 that I have a test.txt file and a script.py file. 54 00:02:51,030 --> 00:02:53,780 And again, just to double check, if I go to my desktop, 55 00:02:55,050 --> 00:02:57,300 yep, I have these two right here. 56 00:02:57,300 --> 00:02:58,590 Perfect. 57 00:02:58,590 --> 00:03:02,460 Now, in here, we can simply say 58 00:03:02,460 --> 00:03:03,430 in our script file 59 00:03:05,730 --> 00:03:06,563 open, 60 00:03:07,860 --> 00:03:11,043 and the name of the file, in our case it's test, 61 00:03:12,240 --> 00:03:16,893 make sure it's string, test.txt, just like that. 62 00:03:17,970 --> 00:03:21,273 Now, I can assign this to a variable, calling it my_file, 63 00:03:23,340 --> 00:03:26,700 and now we have, well, the file object. 64 00:03:26,700 --> 00:03:27,750 So let's check this out. 65 00:03:27,750 --> 00:03:31,200 If I do here, print my_file, 66 00:03:31,200 --> 00:03:32,460 let's see what happens. 67 00:03:32,460 --> 00:03:34,080 I'm going to run my code, 68 00:03:34,080 --> 00:03:38,583 so let's say python3, and then run script.py. 69 00:03:40,170 --> 00:03:45,000 All right, I get an object, a TextIOWrapper. 70 00:03:45,000 --> 00:03:47,010 I get the name, which is text. 71 00:03:47,010 --> 00:03:50,550 I get mode, which I'm not sure what it is yet, we'll learn. 72 00:03:50,550 --> 00:03:54,000 And then encodings, which is how this file is encoded, 73 00:03:54,000 --> 00:03:55,440 which is UTF-8. 74 00:03:55,440 --> 00:03:58,353 Most of the files are usually encoded in UTF-8. 75 00:03:59,460 --> 00:04:03,270 All right, so how can I actually read this file? 76 00:04:03,270 --> 00:04:08,270 All we need to do is my file has a .read method on it 77 00:04:09,720 --> 00:04:11,283 so that if I run this now, 78 00:04:12,960 --> 00:04:16,019 well, I get a blank piece of space 79 00:04:16,019 --> 00:04:19,170 because there's nothing on this test.txt file, 80 00:04:19,170 --> 00:04:20,640 so let's write something. 81 00:04:20,640 --> 00:04:23,520 I'll open this file in Sublime Text 82 00:04:23,520 --> 00:04:28,520 and just write, hi my name is Andrei Neagoie. 83 00:04:29,670 --> 00:04:31,620 I got really, really creative with this one. 84 00:04:31,620 --> 00:04:32,640 Good job. 85 00:04:32,640 --> 00:04:33,890 All right, let's go back. 86 00:04:35,250 --> 00:04:38,820 So now if I read this file, look at that, 87 00:04:38,820 --> 00:04:41,370 I'm able to read, hi my name is Andrei, 88 00:04:41,370 --> 00:04:43,683 and now, there you go, I've read my file. 89 00:04:44,880 --> 00:04:46,140 Nice and easy. 90 00:04:46,140 --> 00:04:47,790 Now, if I run this again, 91 00:04:47,790 --> 00:04:51,063 let's just read this multiple times. 92 00:04:51,990 --> 00:04:52,923 If I click Run, 93 00:04:55,320 --> 00:04:59,160 hmm, I'm able to read the first time around, 94 00:04:59,160 --> 00:05:02,010 but these two times I'm not reading anything. 95 00:05:02,010 --> 00:05:03,213 Why is that? 96 00:05:04,140 --> 00:05:08,433 Well, this open function has this idea of a cursor, 97 00:05:09,930 --> 00:05:12,610 that is, you can only read the file once 98 00:05:13,500 --> 00:05:17,283 and once you open, it returns a file object, 99 00:05:18,300 --> 00:05:20,490 and the contents of the file you can read, 100 00:05:20,490 --> 00:05:23,430 and the contents of the file are read with a cursor, 101 00:05:23,430 --> 00:05:24,630 just like you see here, 102 00:05:24,630 --> 00:05:27,570 one by one and printed onto the screen, 103 00:05:27,570 --> 00:05:30,630 but by the end of this first reading, 104 00:05:30,630 --> 00:05:33,540 the cursor is going to be at the end of the file. 105 00:05:33,540 --> 00:05:36,420 So now when it tries to read, it's going to be end 106 00:05:36,420 --> 00:05:39,480 of the file and nothing will be left there. 107 00:05:39,480 --> 00:05:43,860 So, the way we get around this is to do something like this, 108 00:05:43,860 --> 00:05:48,860 we simply say, my_file.seek, 109 00:05:49,380 --> 00:05:52,350 which moves our cursor to whatever index we want, 110 00:05:52,350 --> 00:05:54,750 in our case, seek zero. 111 00:05:54,750 --> 00:05:57,600 So if I run this now, there you go. 112 00:05:57,600 --> 00:06:02,600 And if I move the cursor back and I save and run this, 113 00:06:03,780 --> 00:06:05,583 all right, that's a lot better now, 114 00:06:06,990 --> 00:06:08,520 and this is just to demonstrate 115 00:06:08,520 --> 00:06:12,663 that Python uses this idea of a cursor to read a file. 116 00:06:13,950 --> 00:06:18,603 Now, another unique thing that I can do is to do readline, 117 00:06:20,880 --> 00:06:25,320 so that if I run this, it reads the line, 118 00:06:25,320 --> 00:06:29,160 but let's say our text file has different lines, 119 00:06:29,160 --> 00:06:30,993 so, how are you? 120 00:06:32,310 --> 00:06:34,350 Let's say a smiley face here. 121 00:06:34,350 --> 00:06:36,273 If I now read the line, 122 00:06:38,730 --> 00:06:40,740 again, hi my name is Andrei. 123 00:06:40,740 --> 00:06:43,560 If I read the line again, hi my name is Andrei, 124 00:06:43,560 --> 00:06:48,560 because I get each line, I only get the first line. 125 00:06:49,260 --> 00:06:51,573 If I print this multiple times, 126 00:06:53,460 --> 00:06:56,700 and I run this, all right, that's better. 127 00:06:56,700 --> 00:06:59,850 I get, hi my name is Andrei, smiley face, how are you? 128 00:06:59,850 --> 00:07:03,060 Again, the cursor keeps moving, right? 129 00:07:03,060 --> 00:07:05,640 So I can just keep reading the lines. 130 00:07:05,640 --> 00:07:09,120 Another thing that I can do is to just say readlines. 131 00:07:09,120 --> 00:07:14,120 If I run this, I get a list that contains the entire file, 132 00:07:15,960 --> 00:07:17,400 reads all the lines, 133 00:07:17,400 --> 00:07:21,210 and you see here that I have, hi my name is Andrei Neagoie, 134 00:07:21,210 --> 00:07:25,800 I have a new line here, a smiley face, a new line, 135 00:07:25,800 --> 00:07:29,460 remember, this is escape sequencing, and then, how are you? 136 00:07:29,460 --> 00:07:32,310 And no new line here, because there's no new line, 137 00:07:32,310 --> 00:07:33,933 it's the end of the file. 138 00:07:35,130 --> 00:07:38,010 So, these are extremely useful. 139 00:07:38,010 --> 00:07:42,000 Maybe we can use regular expressions to search 140 00:07:42,000 --> 00:07:44,340 for a piece of text in a file. 141 00:07:44,340 --> 00:07:45,690 That's pretty useful right? 142 00:07:47,670 --> 00:07:50,340 Now, the very last thing we need to do 143 00:07:50,340 --> 00:07:51,930 and this is a little annoying, 144 00:07:51,930 --> 00:07:54,993 so I'll show you how we don't need to do it in the future, 145 00:07:56,730 --> 00:08:00,060 but you actually have to manually close the file 146 00:08:00,060 --> 00:08:03,033 after you've opened it with open, 147 00:08:04,590 --> 00:08:07,740 so you can use it somewhere else in the program. 148 00:08:07,740 --> 00:08:09,000 It's just a good standard. 149 00:08:09,000 --> 00:08:12,750 So, what you have to do is say my_file.close 150 00:08:12,750 --> 00:08:14,050 after you're done with it. 151 00:08:15,630 --> 00:08:17,080 You tell your computer, "Hey, 152 00:08:17,970 --> 00:08:19,650 you need to stop whatever you're doing. 153 00:08:19,650 --> 00:08:21,720 I'm not interested in the file anymore. 154 00:08:21,720 --> 00:08:22,553 We're done with it. 155 00:08:22,553 --> 00:08:24,210 You can use it somewhere else." 156 00:08:24,210 --> 00:08:27,693 So, usually we do something like this and we're all done. 157 00:08:28,740 --> 00:08:31,590 Let's take a break and learn some more in the next video.