This is a translated page. The original can be found here: http://iwebdevel.com/2009/10/03/php-how-to-download-a-webpage-aka-web-scrapping-with-php-fsockopen-file_get_contents-curl-function-download-web-page/
UPDATES VIA RSS | Email Ažuriranja putem RSS | E-mail Get updates via feedburner Get updates via twitter
Home / Coding / PHP / PHP: How to download a webpa… Home / Kodiranje / PHP / PHP: Kako skinuti webpa ...

PHP: How to download a webpage (aka web scrapping) with PHP PHP: Kako skinuti web stranica (tzv. web scrapping) sa PHP

Posted on 03. Poslano na 03. Oct, 2009 by Dragos in Coding , PHP Lis, 2009 by Dragos u kodiranju, PHP

There are many ways of downloading web pages, or web content. Postoje mnogi načini za skidanje web stranica ili web-sadržaja. Personally I like to use cURL for my web scrapping needs, but sometimes I also use fsockopen and file_get_contents . Osobno volim koristiti curl za moje potrebe web scrapping, ali ponekad sam također koristiti i fsockopen file_get_contents.

Here are 3 different functions that will allow you to download web content. Ovdje su 3 različite funkcije koje će vam omogućiti da preuzimanje web-sadržaja.

cURL : Curl:

 function getData($url) { funkcija getData ($ url) ( 
     if($url!='localhost' && $url!='http://localhost') { if ($ url! = 'localhost' & & $ url! = 'http://localhost') ( 
         $ch=curl_init(); $ ch = curl_init (); 
         curl_setopt($ch, CURLOPT_URL, $url); curl_setopt ($ ch, CURLOPT_URL, $ url); 
         curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, TRUE); 
         curl_setopt($ch,CURLOPT_USERAGENT,"Mozilla/6.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.3"); curl_setopt ($ ch, CURLOPT_USERAGENT, "Mozilla/6.0 (Windows; U; Windows NT 5.1; en-US; rv: 1.9.0.1) Gecko/2008070208 Firefox/3.0.3"); 
         curl_setopt($ch, CURLOPT_FOLLOWLOCATION,3); curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, 3); 
         $result['data']=curl_exec($ch); $ result [ 'data'] = curl_exec ($ ch); 
         $result['error']=curl_error($ch); $ result [ 'error'] = curl_error ($ ch); 
         curl_close($ch); curl_close ($ ch); 
         return $result; return $ result; 
     } ) 
     else return $result['error']='err'; else return $ result [ 'error'] = 'griješiti'; 
 } ) 

fsockopen fsockopen

 function getData($url) { funkcija getData ($ url) ( 
     $arr=parse_url($url); $ arr = parse_url ($ url); 
     $fp = fsockopen($arr['host'], 80, $errno, $errstr, 30); $ fp = fsockopen ($ arr [ 'host'], 80, $ errno, $ errstr, 30); 
     if(!$fp) { if (! $ fp) ( 
         return false; return false; 
     }else { ) else ( 
     // send headers / / Slanje zaglavlja 
         $out = "GET ".fsockopen($arr['host'], 80, $errno, $errstr, 30)." HTTP/1.1\r\n"; $ out = "GET". fsockopen ($ arr [ 'host'], 80, $ errno, $ errstr, 30). "HTTP/1.1 \ r \ n"; 
         $out .= "Host: ".str_replace('http://'.$arr['host'],'',$url)."\r\n"; $ out .= "Host:". str_replace ( 'http://'. $ arr [ 'host'],'',$ URL). "\ r \ n"; 
         $out .= "User-Agent: FSOCKOPEN\r\n"; $ out .= "User-Agent: fsockopen \ r \ n"; 
         $out .= "Connection: Close\r\n\r\n"; $ out .= "Connection: Close \ r \ n \ r \ n"; 
         fwrite($fp, $out); fwrite ($ fp, $ out); 
         while(!feof($fp)) { while (! feof ($ fp)) ( 
             $contents .= fgets($fp, 4096); $ sadržaj .= fgets ($ fp, 4096); 
         }; ); 
         fclose($fp); fclose ($ fp); 
         return $contents; return $ sadržaja; 
     } ) 
 } ) 

file_get_contents file_get_contents

 function getData($url) { funkcija getData ($ url) ( 
 return file_get_contents($url); povratak file_get_contents ($ url); 
 } ) 

As you see the easiest way of downloading web content is by using the file_get_contents function, but if you need more options, especially if you are working with the headers, then cURL is the best way to go for you. Kao što vidite najlakši način preuzimanja web sadržaj pomoću file_get_contents funkciju, ali ako trebate više opcija, pogotovo ako radite sa zaglavljima, onda curl je najbolji način da ide za vas.

Translate this post Translate this post





No related posts. Nema povezanih postova.

    blog comments powered by Disqus Blog komentari powered by Disqus