1 00:00:02,250 --> 00:00:07,020 Let's check now if a popular search engine like Google will enable us to find confidential information 2 00:00:07,020 --> 00:00:07,680 on the Internet 3 00:00:11,470 --> 00:00:15,120 I have a few predefined queries which you've already seen on a slide. 4 00:00:17,770 --> 00:00:23,010 We'll start with exploring whether there are any results for a company Zola's spreadsheets. 5 00:00:23,160 --> 00:00:27,520 In other words Excel spreadsheets that relate to user accounts 6 00:00:33,800 --> 00:00:37,740 as it turns out there are over 64000 results for the search. 7 00:00:40,380 --> 00:00:44,010 It takes a lot of time and persistence to browse through all of these spreadsheets 8 00:00:48,400 --> 00:00:54,720 they'll try to give you a more interesting example and show another type of search. 9 00:00:54,830 --> 00:01:02,850 This time will search for PTF files on military Web sites that contain the phrase top secret. 10 00:01:02,880 --> 00:01:06,640 It doesn't seem likely that anyone would put these documents on the Web does it. 11 00:01:09,080 --> 00:01:11,900 But here you are. 12 00:01:12,030 --> 00:01:17,790 There are over 53000 results. 13 00:01:17,850 --> 00:01:21,780 If this piques your interest and you're not planning on visiting any other countries that put up these 14 00:01:21,780 --> 00:01:28,020 files go ahead and click on the links to find out what types of classified information can easily be 15 00:01:28,020 --> 00:01:29,210 found on the web. 16 00:01:32,180 --> 00:01:38,320 The crawling methods I've shown you are manual that can be automated though. 17 00:01:38,730 --> 00:01:41,430 For example by using a tool like gulag scanner 18 00:01:44,170 --> 00:01:51,200 the application contains a simple set of Google queries that fall into several categories. 19 00:01:51,310 --> 00:01:58,470 There are for example video files containing juicy info like passwords Loggins and the like. 20 00:01:59,300 --> 00:02:06,040 Or software error messages that display sensitive information on a system another category includes 21 00:02:06,040 --> 00:02:13,290 files containing usernames clicking on open in the browser will cause a search engine with selected 22 00:02:13,290 --> 00:02:18,560 file types to be displayed instead of clicking on each file type. 23 00:02:18,710 --> 00:02:24,230 You can select a whole category type in the page name that interests you. 24 00:02:24,230 --> 00:02:29,630 For example example does come and click on scan. 25 00:02:29,650 --> 00:02:33,710 Notice that you don't connect to the site directly. 26 00:02:33,860 --> 00:02:37,140 We don't search for information on the servers of a given company. 27 00:02:39,410 --> 00:02:46,400 The query is not only sent to the Google search engine we're exploring what has already been indexed 28 00:02:48,870 --> 00:02:52,890 we'll be able to send bulk queries and still search engines like Google blocked them. 29 00:02:57,870 --> 00:03:01,010 As you can see in the slide 15 results have been returned 30 00:03:04,020 --> 00:03:08,550 all queries have been processed successfully which means there are no files that contain user names 31 00:03:08,550 --> 00:03:09,390 and the results 32 00:03:14,750 --> 00:03:19,760 a small application called Foca is a rather curious example of a search engine that cross the web for 33 00:03:19,760 --> 00:03:21,800 publicly available information. 34 00:03:24,570 --> 00:03:30,500 It enables users to not only analyze selected web sites but also the meta data that saved the website 35 00:03:30,570 --> 00:03:31,590 files. 36 00:03:33,840 --> 00:03:37,110 We'll create a project by entering a web site name. 37 00:03:37,110 --> 00:03:38,910 For example Microsoft dot com 38 00:03:46,960 --> 00:03:48,790 using search engines like Google are being. 39 00:03:48,790 --> 00:03:56,210 We can try to extract information on the network infrastructure the servers versions and so on. 40 00:03:58,150 --> 00:04:02,920 We can try to ascertain whether or not there are any vulnerabilities or susceptibilities to a given 41 00:04:02,920 --> 00:04:07,540 type of threat. 42 00:04:07,540 --> 00:04:10,710 We can also read the meta data or the various documents 43 00:04:17,290 --> 00:04:20,500 opening the documents will extract information on their creators 44 00:04:31,610 --> 00:04:33,440 to download unanalyzed file. 45 00:04:33,530 --> 00:04:35,780 Right click on the document and select download 46 00:04:42,050 --> 00:04:43,560 once you've downloaded the files. 47 00:04:43,700 --> 00:04:45,000 They can be analyzed. 48 00:04:47,320 --> 00:04:54,070 Click on extract metor data will extract all the meta data available for the document 49 00:04:56,950 --> 00:05:00,220 files from both categories will be analyzed in sequence. 50 00:05:02,350 --> 00:05:10,190 We have records for 10 users. 51 00:05:10,240 --> 00:05:13,190 They're the creators of PTF documents or doc files. 52 00:05:14,690 --> 00:05:20,210 We can view the information on when and in how many documents they used Open Office in Microsoft Office 53 00:05:23,250 --> 00:05:27,740 and also see the versions of the operating systems that were used to create the documents. 54 00:05:29,870 --> 00:05:36,870 All of this metadata is contained in the files that we edit and save on a daily basis. 55 00:05:36,960 --> 00:05:42,540 When you publish your files on the Internet you have to realize that the metadata will also be published 56 00:05:42,570 --> 00:05:43,650 and be accessible 57 00:05:46,070 --> 00:05:46,690 meditator. 58 00:05:46,700 --> 00:05:52,630 This causes not only the information on a configuration of the system but it also contains your personal 59 00:05:52,630 --> 00:05:53,440 information 60 00:05:56,790 --> 00:06:00,000 Foca is a great tool for extracting information of this type.