Home > Parse > Parse Data Using Proxy to Avoid Blockage

Parse Data Using Proxy to Avoid Blockage


Some of the websites has  advanced security to monitor the abnormal traffic / hits. Basically it tracked the nonhuman behavior or you can say clicks clicks, crawling on their network/website.  Eventually they blocked the IP address or network from where suspicious hits come in.

As a professional Website Scraper you should able to adapt the Scraping Technology and Parse the data without experiencing IP or network blockage of your own. 

To be continued …. sometimes this week…I promised!

Okay! am back..

I am exploring this in terms of PHP Simple HTML DOM Parse and using it’s library.

So at the beginning include this library:

/*PHP Script on Scrape , parse using proxy*/

include(‘simple_html_dom.php’);

$url = ‘the url you wanted to parse’;

/* Connecting Via Proxy */
$via_proxy= array
(
‘http’ => array
(
‘proxy’ =>’addresseproxy:portproxy’,

‘request_fulluri’ => true,
),
);

$via_proxy= stream_context_create($via_proxy);

$html = file_get_html($url,false,$via_proxy);
/* EO Proxy */

Now we must need to consider two vital issues in this Scraping / parsing technique :

#1. Valid Good Proxy Addresses and ports

#2. Does this proxy and Scrape Script really working via Proxy.

There are thousands of websites that broadcasting proxy addresses but I preferred to use  XROXY.COM and port 80, you may have different sources and preferences.

I have taken from http://www.xroxy.com/proxy-port-80.htm

Let’s check now – is this proxy really working:

<?php

/*PHP Script on validation of using proxy in Scraping, parsing*/

include(‘simple_html_dom.php’);

error_reporting(0);

$url = ‘http://www.find-ip-address.org/&#8217;; /*This website track back your own IP or your gateway IP*/

/* Connecting Via Proxy */
$via_proxy= array
(
‘http’ => array
(
‘proxy’ => ‘95.65.100.24:80’,

‘request_fulluri’ => true,
),
);

$via_proxy= stream_context_create($via_proxy);

$html = file_get_html($url,false,$via_proxy);
/* EO Proxy */

echo $html->outertext;

?>

You will see on output page : My Ip Address: 95.65.100.24

Note: it is just a very simple way and initial exploring as I promised  – no worries guys I will put more complex way while getting some free hours 😉 …

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: