Wednesday, May 13, 2015

Web Scraper

$request_url ='https://www.google.es/search?q=Barcelona';

// The Regular Expression filter
 $reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";

 function get_domain($url)
 {
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
   return $regs['domain'];
  }
  return false;
 }

 $ch = curl_init();
 curl_setopt($ch, CURLOPT_URL, $request_url); // The url to get links from

 curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // We want to get the respone

 $result = curl_exec($ch);

 $regex='|<a.*?href="(.*?)"|';
 preg_match_all($regex,$result,$parts);
 $title = preg_match('/title="(.+)">/', $html, $match);
 $links=$parts[1];
 asort($links);

 foreach($links as $link){
  $pos = strpos($link, '://');
  $exclude = strpos($link, 'google');  //remove google own results

  if ($pos!=0 && $exclude==0){
    $posini = strpos($link, 'http');
    $link = substr($link, $posini);
    echo "<a href='".$link."'>".get_domain($link)."</a> -> ".$link."<br>";
  }
 }

 curl_close($ch);


Sample:
http://viladecansoutlet.es/scrap.php

Sources:

6 comments:

  1. This content creates a new hope and inspiration with in me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks.
    Digital marketing company
    Digital marketing services

    ReplyDelete
  2. Thank you for posting the great content…I was looking for something like this…I found it quiet interesting, hopefully and you will keep posting such blogs….Keep sharing.
    SEO Company in Bangalore, SEO Services in Bangalore

    ReplyDelete
  3. A very well-written post. I read and liked the post and have also bookmarked you. All the best for future endeavors. Getting some solution regarding.
    UI Designing Company in Bangalore, Web Application Development Companies in Bangalore

    ReplyDelete
  4. Nice Sharing..! I have been following you for a couple of months now but this is my first time commenting on a blog post. Thank you for sharing your knowledge and experience with us. Keep up the good work. Already bookmarked for future reference.

    Installment loans in Mississippi
    Payday loans in Mississippi
    Title loans in Mississippi

    ReplyDelete
  5. Good work…unique site and interesting too… keep it up…looking forward for more updates.Good luck to all of you and thanks so much for your hard-work.

    Language Interpretation Services, Voice Over Services India


    ReplyDelete
  6. This is an informative post and it is very useful and knowledgeable. therefore, I would like to thank you for the efforts you have made in writing this article.
    Language Translation Services ,
    Subtitling Companies in Bangalore

    ReplyDelete