Archive for the 'Uncategorized' Category

Published by admin on 08 Aug 2008

Scrape site content with PHP5 DomXPath + Firebug

This is one of those “WoW” moments that make this game worthwhile the time.

If you need to scrape off information of the other site, forget regular expressions and string parsing. PHP5 has wonderful DOM Xpath functions that you can use to traverse the scraped page DOM and retrieve your information. To make matters even easier for you, aspiring, freebie loving, php enthusiast (that’s me!), you can get Xpath easily via Firebug extension in Firefox.

Firebug Xpath information

Now, that we have our Xpath, we ready for some PHP magic. But before going forward, NOTE: Firefox automatically fixes invalid html. For example, it adds tbody to every table that does not have it. Examine the page code and take this extra markup out.

And here is some sweet PHP5 goodness to make it all work. (In this example i’ll print out all link hrefs and anchors for a links in a certain table row).


$html=file_get_contents('dummy.html');

$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body/table/tr[2]/td[2]/table/tr/td/table/tr[2]//a");

for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
$value = $href->nodeValue;
echo "$url  => $value<br />";
}

Easy. No regular expressions, no string parsing. Just couple lines of PHP5 code.

Published by admin on 06 Aug 2008

Check your google SERPs without proxy or VPN

To check your site’s google positions from the different country/geographic area people used to (ab)use proxies and VPNs. There is a little known feature in Google Adwords that lets and advertiser test whether
his/her ad is running under a certain keyword. The great thing about it is the ability to choose precise geographic area to view results from and it gives you natural search results as seen from this area. What the tool does, is very simple. It adds [code]&adtest=on[/code]]to the google query string to snap it to the adtest mode. The gl= and gr= variables specify geographic targetting.

If you want to see what pops up for query “blue widgets” in California, you’d use:

http://www.google.com/search?hl=en&q=blue+widget&adtest=on&gl=US&gr=US-CA

Check it out.

To get possible values for gl and gr variables, go to Adwords->Tools->Ad DIagnostic Results and examine the url query for your target market.

Nifty, huh?