0 1 00:00:11,740 --> 00:00:19,590 Hi everyone. In this video, we want to talk about the compilation process before we can jump in to the 1 2 00:00:19,590 --> 00:00:21,360 actual reverse engineering. 2 3 00:00:21,450 --> 00:00:27,000 So if you remember our previous videos we talked about how an operating system loads and executable 3 4 00:00:27,480 --> 00:00:35,970 into the machine, how exactly it runs, how the linked objects basically run that executable how the operating 4 5 00:00:35,970 --> 00:00:44,410 system manages memories and stuff like that so in this video we basically want to move a step back and 5 6 00:00:44,410 --> 00:00:50,020 talk about how we create the executable which you run on the operating systems. 6 7 00:00:50,130 --> 00:00:56,650 Now when you're trying to build up a software you would obviously be choosing one or the other programming 7 8 00:00:56,650 --> 00:01:00,330 language in which you can write your high level codes. 8 9 00:01:00,340 --> 00:01:07,060 Now those high level codes have to be converted into machine-understandable code so that different 9 10 00:01:07,060 --> 00:01:11,200 computers or different architectures can basically run their operating systems. 10 11 00:01:11,590 --> 00:01:18,550 So the reason why software has gained so much popularity is because when you create a software it can 11 12 00:01:18,550 --> 00:01:25,570 run on any device on on whatever operating system that it supports it's not just dependent on one machine 12 13 00:01:25,630 --> 00:01:32,740 or one single device it can run on all these similar operating systems that your software can support. 13 14 00:01:32,740 --> 00:01:39,670 So that's that's the reason why there was an exponential growth in in the software development field. 14 15 00:01:39,970 --> 00:01:46,170 So this was only possible because those software can be converted into machine understandable 15 16 00:01:46,170 --> 00:01:48,930 codes that can universally run everywhere 16 17 00:01:53,620 --> 00:02:02,080 so if we talk about compiler based high level languages like C++ then when you're writing your software 17 18 00:02:02,080 --> 00:02:07,930 you will basically be creating your program in the language you would start defining your functions 18 19 00:02:07,960 --> 00:02:13,600 you would start defining what kind of operations can happen then when your function is executed and 19 20 00:02:13,600 --> 00:02:14,550 things like that. 20 21 00:02:14,560 --> 00:02:20,500 So once you have completed writing the code you basically run it through a compiler. 21 22 00:02:20,500 --> 00:02:26,200 Now the compiler follows four stages before it creates the final executable. 22 23 00:02:26,980 --> 00:02:32,740 So the four stages are the first one is called the pre processing stage. 23 24 00:02:32,770 --> 00:02:41,260 So as the name suggests it's a pre-processing stage. The steps that happened before processing. 24 25 00:02:41,260 --> 00:02:47,710 So what exactly does that take to removes all the extra payments that you might have added in your code. 25 26 00:02:47,740 --> 00:02:53,500 It expands all the macros that you might have added if you have things like regex says and all then 26 27 00:02:53,500 --> 00:02:55,900 all those things gets expanded. 27 28 00:02:56,020 --> 00:02:59,650 It also starts expanding the included files. 28 29 00:02:59,650 --> 00:03:06,190 For example if you're if you're using if you're importing different C libraries into your program then 29 30 00:03:06,190 --> 00:03:13,240 all those things will start getting expanded in the pre processing stage. From there the control flows 30 31 00:03:13,240 --> 00:03:15,070 to the compiling stage. 31 32 00:03:15,070 --> 00:03:23,590 So in the compiling stage the result of the pre processing is used to generate the assembly language 32 33 00:03:25,630 --> 00:03:34,750 so assembly language is a low level programming language which kind of gives you some understanding 33 34 00:03:34,870 --> 00:03:40,210 of how the execution will look like at the processor level. 34 35 00:03:40,300 --> 00:03:46,810 Along with giving you the ability to write those instructions so it's not completely machine code but 35 36 00:03:46,840 --> 00:03:52,090 it's also not completely like a high level programming language like C or Java. 36 37 00:03:52,120 --> 00:03:53,840 It's somewhere in between. 37 38 00:03:53,860 --> 00:03:56,730 That's why it's called a low level programming language. 38 39 00:03:56,740 --> 00:04:04,150 It's it's pretty old and it's based on the initial days when the computer architecture was still being 39 40 00:04:04,150 --> 00:04:11,830 developed. So assembly language or having the idea of interpreting assembly language is going to be 40 41 00:04:11,830 --> 00:04:20,310 a key skill when we would be reverse-engineering the malware. From compiling stage processing then moves 41 42 00:04:20,310 --> 00:04:28,470 to the assembly stage. So the assembler will convert the assembly code into pure binary codes or the 42 43 00:04:28,470 --> 00:04:31,150 machine code. The zeros and ones. 43 44 00:04:31,170 --> 00:04:39,110 So as you know computers can only understand the binary values the zeros and ones they cannot understand 44 45 00:04:39,120 --> 00:04:41,640 the high level programming languages that we write. 45 46 00:04:41,940 --> 00:04:48,690 It's just the abstraction level which makes it easy for us to write our code in a high level understandable 46 47 00:04:48,690 --> 00:04:51,210 language like C or Python or Java. 47 48 00:04:51,600 --> 00:05:00,310 But ultimately what computers clearly need is the binary level information so that's where the assembly 48 49 00:05:00,310 --> 00:05:01,700 stage comes into the picture. 49 50 00:05:01,780 --> 00:05:10,360 It generates the final machine code that the computers can understand and execute. So after 50 51 00:05:10,380 --> 00:05:13,480 assembly it moves to the final stage which is the linking stage. 51 52 00:05:13,700 --> 00:05:20,600 So what the linker does is that it combines all the information gathered from the previous three phases 52 53 00:05:20,900 --> 00:05:26,710 and for example when you write a software it's not just one single program that you write. 53 54 00:05:26,750 --> 00:05:32,140 Basically a bunch of different programs and you are using the features of all those different programs. 54 55 00:05:32,180 --> 00:05:38,600 So what Linker does is that it starts combining all those together into a single module and finally 55 56 00:05:38,600 --> 00:05:45,550 generates your EXE file or the executable file. Enough of text. 56 57 00:05:45,610 --> 00:05:49,180 Let's look at a pictorial representation as well. 57 58 00:05:49,300 --> 00:05:54,610 It's pretty much what we just talked about but in a much shorter and myself way. 58 59 00:05:54,730 --> 00:06:01,570 So consider the example where you have your source code written in C so the source code file will have 59 60 00:06:01,570 --> 00:06:07,570 an instruction of dot c it goes through the pre processor phase which is the pre processing phase it 60 61 00:06:07,570 --> 00:06:13,820 means it removes all the comments that expands the headers and everything. From there, 61 62 00:06:13,830 --> 00:06:15,970 The control comes to the compiler. 62 63 00:06:16,210 --> 00:06:19,180 The compiler generates the assembly code. 63 64 00:06:19,270 --> 00:06:25,630 That's where I'm stressing a little bit because we are going to be with a lot of assembly code in future. 64 65 00:06:25,630 --> 00:06:31,330 Now this generative assembly code then flows through the assembler which creates the object file or 65 66 00:06:31,330 --> 00:06:35,840 the final binary instruction which the computer can understand. 66 67 00:06:36,010 --> 00:06:42,820 Now using all those information Linker then finally creates your executable file or the exe file which 67 68 00:06:42,820 --> 00:06:47,590 you can then distribute across and it can run on any operating system. 68 69 00:06:48,310 --> 00:06:50,320 So that's all for this video. 69 70 00:06:50,320 --> 00:06:50,990 Thanks for watching.