Credits: http://vision-media.ca/resources/php/create-a-php-web-crawler-or-scraper-5-minutes

Utilizing the PHP programming language we show you how to create an infinitely extendable web crawler in under 5 minutes, collecting images and links.

The Crawler Framework

First we need to create the crawler class as follows:

<?php

class Crawler {

}

?>

We then will create methods to fetch the web pages markup, and to parse it for data that we are looking at collecting. The only public methods will be getMarkup() and get() as the parsing methods will generally be used privately for the crawler, however the visibility is set to protected since you never know who will want to extend its functionality.

<?php

class Crawler {

protected $markup = ”;

public function __construct($uri) {

}

public function getMarkup() {

}

public function get($type) {

}

protected function _get_images() {

}

protected function _get_links() {

}

}

?>

Fetching Site Markup

The constructor will accept a URI so we can instantiate it such as new Crawler(‘http://vision-media.ca’); which then will set our $markup property using PHP’s file_get_contents() function which fetches the sites markup.

<?php

public function __construct($uri) {

$this->markup = $this->getMarkup($uri);

}

public function getMarkup($uri) {

return file_get_contents($uri);

}

?>

Crawling The Markup For Data

Our get() method will accept a $type string which essentially will simply be used to invoke another method actually doing the processing. As you can see below we construct the method name as a string, then make sure it is available so now developers can utilize this simply by invoking $crawl->get(‘images’);

We set visibility for _get_images() and _get_links() to protected so that developers will use our public get() method rather than getting confused and trying to invoke them directly.

Each protected data collection method simply uses the PCRE (Perl Compatible Regular Expressions) function preg_match_all() in order to return all tags within the markup that are accepted using our patterns of /<img([^>]+)\/>/i and /<a([^>]+)\>(.*?)\<\/a\>/i. For more information on regular expressions visit http://en.wikipedia.org/wiki/Regular_expression

<?php

public function get($type) {

$method = “_get_{$type}”;

if (method_exists($this, $method)){

return call_user_method($method, $this);

}

}

protected function _get_images() {

if (!empty($this->markup)){

preg_match_all(‘/<img([^>]+)\/>/i’, $this->markup, $images);

return !empty($images[1]) ? $images[1] : FALSE;

}

}

protected function _get_links() {

if (!empty($this->markup)){

preg_match_all(‘/<a([^>]+)\>(.*?)\<\/a\>/i’, $this->markup, $links);

return !empty($links[1]) ? $links[1] : FALSE;

}

}

?>

Final PHP Web Crawler Code And Usage

<?php

class Crawler {

protected $markup = ”;

public function __construct($uri) {

$this->markup = $this->getMarkup($uri);

}

public function getMarkup($uri) {

return file_get_contents($uri);

}

public function get($type) {

$method = “_get_{$type}”;

if (method_exists($this, $method)){

return call_user_method($method, $this);

}

}

protected function _get_images() {

if (!empty($this->markup)){

preg_match_all(‘/<img([^>]+)\/>/i’, $this->markup, $images);

return !empty($images[1]) ? $images[1] : FALSE;

}

}

protected function _get_links() {

if (!empty($this->markup)){

preg_match_all(‘/<a([^>]+)\>(.*?)\<\/a\>/i’, $this->markup, $links);

return !empty($links[1]) ? $links[1] : FALSE;

}

}

}

a

$crawl = new Crawler(‘http://vision-media.ca’);

$images = $crawl->get(‘images’);

$links = $crawl->get(‘links’);

?>

Author: Peter Cooper
Link : http://codesnippets.joyent.com/user/therad/tag/javascript#post681

function correctPNG() // correctly handle PNG transparency in Win IE 5.5 & 6.
{
var arVersion = navigator.appVersion.split(“MSIE”)
var version = parseFloat(arVersion[1])
if ((version >= 5.5) && (document.body.filters))
{
for(var i=0; i<document.images.length; i++)
{
var img = document.images[i]
var imgName = img.src.toUpperCase()
if (imgName.substring(imgName.length-3, imgName.length) == “PNG”)
{
var imgID = (img.id) ? “id=’” + img.id + “‘ ” : “”
var imgClass = (img.className) ? “class=’” + img.className + “‘ ” : “”
var imgTitle = (img.title) ? “title=’” + img.title + “‘ ” : “title=’” + img.alt + “‘ “
var imgStyle = “display:inline-block;” + img.style.cssText
if (img.align == “left”) imgStyle = “float:left;” + imgStyle
if (img.align == “right”) imgStyle = “float:right;” + imgStyle
if (img.parentElement.href) imgStyle = “cursor:hand;” + imgStyle
var strNewHTML = “
img.outerHTML = strNewHTML
i = i-1
}
}
}
}
window.attachEvent(“onload”, correctPNG);

echo “http://”.$_SERVER['HTTP_HOST'].dirname($_SERVER['PHP_SELF']);

Test Drive Flash Player 10 Beta in Ubuntu

Posted by Tom in software

Adobe has released the Flash Player 10 Beta simultaneously for Linux, Mac, and Windows. This version includes performance improvements, new 3D transformations, Adobe Pixel Bender filters, streaming video improvements, and new text layout capabilities.

Websites very likely won’t be taking advantage of these new features until the stable release is out, so Adobe has a page of demos you can try.

Two large complaints about Flash on Linux are fullscreen video and 64-bit support. Neither have been resolved in this release. Playback of fullscreen video (which causes low framerates and high CPU usage) seems to be only slightly improved. I have found that there is a general performance increase.

Flash Player 10 3D effect

If you want to try Flash Player 10 you can download and install it yourself, but here are some terminal commands that you can copy and paste to get going quickly:

  1. Remove your existing Flash plugin, if you have one installed. This command will remove Flash 9 if you installed it from Ubuntu’s repository:
    sudo apt-get remove flashplugin-nonfree
  2. Download and extract the Flash Player 10 Beta to your home directory:
    wget -O - http://download.macromedia.com/pub/labs/flashplayer10/flashplayer10_install_linux_051508.tar.gz | tar xz -C ~
  3. The user plugins folder may not exist yet, try to create it but ignore any errors if the directory already exists:
    mkdir ~/.mozilla/plugins/
  4. Copy the Flash plugin the the Firefox plugins directory to install it:
    cp ~/install_flash_player_10_linux/libflashplayer.so ~/.mozilla/plugins/libflashplayer.so
  5. Remove the directory that was downloaded (if you get a warning about deleting a write-protected file, press y and Enter to continue):
    rm -r ~/install_flash_player_10_linux

Restart Firefox to enable the new plugin.

And here’s how to uninstall it:

  1. Remove the new plugin:
    rm ~/.mozilla/plugins/libflashplayer.so
  2. Reinstall Flash 9 from the repositories (if you wish):
    sudo apt-get install flashplugin-nonfree

[update] I’ve been using the Flash 10 plugin for over a week now, and the only issue I’ve had is the occasional website that thinks my version of Flash is too old.

http://sysadminschronicles.com/articles/2008/05/06/ubuntu-8-04-rails-server-using-passenger

http://joeabiraad.com/linuxunix/installing-lamp-on-ubuntu-710-linuxapachemysqlphp/100

Lately I’ve been using ubuntu 7.10 for all my projects/daily work.
As a web developer i should have LAMP on my machine and now i would guide you through installing it on yours.

This guide is divided into 3 steps: installing/tesing Apache, PHP and finally MySQL.

Lets start with Apache:
1. Open the terminal (we will be using it through most of my guide) from Applications > Accessories > Terminal
2. Install apache2 using apt-get by typing the following

sudo apt-get install apache2

Note that you should know the root password.
Now everything should be downloaded and installed automatically.
To start/stop apache2 write:

sudo /etc/init.d/apache2 start
sudo /etc/init.d/apache2 stop

Your www folder should be in: /var/www/

If everything is OK you should see an ordinary HTML page when you type: http://localhost in your firefox browser

Finished with Apache ? lets conquer PHP:
1. Also in terminal write:

sudo apt-get install php5 libapache2-mod-php5

or any php version you like

2. restart apache

sudo /etc/init.d/apache2 restart

This is it for PHP D
Wanna test it ? Just create an ordinary PHP page in /var/www/ and run it.
Example:

sudo gedit /var/www/test.php

and write in it: < ?php echo “Hello World”; ?>

Now run it by typing http://localhost/test.php in firefox… You should see your ” Hello World ”

66 % is over, lets continue to installing MySQL:
1. Again and again in terminal execute:

sudo apt-get install mysql-server

2. (optional) If you are running a server you should probably bind your address by editing bind-address in /etc/mysql/my.cnf and replacing its value (127.0.0.1) by your IP address
3. set your root password (although mysql should ask you about that when installing)

mysql> SET PASSWORD FOR ‘root’@’localhost’ = PASSWORD(’xxxxxx’);

4. Try running it

mysql -uroot -pxxx

where xxx is your password.
Note: You can install PHPMyAdmin for a graphical user interface of MySQL by executing

sudo apt-get install libapache2-mod-auth-mysql php5-mysql phpmyadmin

5. restart apache for the last time

sudo /etc/init.d/apache2 restart

Congratulions your LAMP system is installed and running D
Happy Coding

UPDATE:
Due to the large number of people emailing about installing/running phpmyadmin.
Do the following:

sudo apt-get install phpmyadmin

The phpmyadmin configuration file will be installed in: /etc/phpmyadmin
Now you will have to edit the apache config file by typing

sudo vi /etc/apache2/apache2.conf

and include the following line:

Include /etc/phpmyadmin/apache.conf

Restart Apache

sudo /etc/init.d/apache2 restart

Another issue was making mysql run with php5
First install these packages:

sudo apt-get install php5-mysql mysql-client

then edit php.ini and add to it this line : ” extensions=mysql.so” if it isnt already there

sudo vi /etc/php5/apache2/php.ini

Restart Apache

sudo /etc/init.d/apache2 restart
So before my monstrous OpenSolaris 2008.05 post on setting up a Ruby on Rails server I decided to write a guide on setting up a Ubuntu 8.04 server guide for all you Slicehost users! I decided to write this guide because of the new optimized kernel that was added to Ubuntu Server 8.04 for virtualized environments. I also wanted a complete guide that would be a solid reference and now just have bits and pieces for upcoming sysadmins will get lost when reading.For simplicity I will start with a black machine and build upon that. Use the comments section for specific questions or starting points. I will try to do my best at answering any and all questions.

Requirements

This section will go over the simple requirements of the entire setup.

Hardware

Ubuntu 8.04 Server – This could be anything below:

  • Slicehost
  • VMware
  • Bare Metal Install

Software

  • Apache 2.2.8
  • MySQL/PostgreSQL/SQLite3
  • Git
  • Ruby
  • Rubygems
    • Rails
    • Capistrano
    • RSpec
    • Ultrasphinx
    • Passenger

Installation of Software

First thing before we start installing anything on this machine we must update the server. This is very simple with Ubuntu, it is two simple commands and you are all set. You only need to reboot the machine if a kernel was installed.

sudo apt-get install update
sudo apt-get install dist-upgrade

Now that the machine is updated we must install some essential tools in order to build software on this server. Once we are done with the setup it would be a good idea to remove these tools to increase security on our server.

sudo apt-get install build-essential

Now we are all set with the preparation of the server and we can start installing the software we need to get going.

Web Server

For the web server I chose to use Apache 2 because of the new Passenger gem or (mod_rails). This gem is great because of the simplicity to deploy new applications.

sudo apt-get install apache2 apache2-dev

Database Server

The database server that should be used is completely up to your preference. My recommendation is PostgreSQL. PostgreSQL is a very robust and fast database server that is rock solid. It does use a lot of resources so for Slicehost it may not be the best choice. A major player for a slim and fast database for Slicehost should be SQLite3. It is a wonderful database and should be thrown out so quickly because of its lack of a client/server architecture.

For this tutorial I will install MySQL because of its popularity with the Rails community.

sudo apt-get install mysql-server

When prompted enter a root password, make this complex and write it down.

Version Control

Git is the most sexy version control system every created. I will never look back to subversion again. Now that capistraon and redmine both support git I have no reason to even thing about those awful three letters.

To install git is yet another apt-get command away. Run the following command in the terminal of your new server.

sudo apt-get install git-core curl *gitweb*

gitweb is an optional web frontend for your applications. I do not use it because I use GitNub a RubyCocoa application for the Mac.

Once that finishes git is completely installed and ready to go.

Ruby

Installing Ruby on Ubuntu 8.04 is quite simple. Just another apt-get and you are all set… almost. Since the inception of Ruby 1.9.0 distributions have been naming the current stable release of ruby “ruby1.8″ That being said we will make a couple symlinks.

Ruby 1.8.6

To install all the tools you will want on this server run the following command:

sudo apt-get install ruby1.8 ruby1.8-dev rdoc1.8 ri1.8 libopenssl-ruby1.8

Rubygems

I refuse to install Rubygems with apt-get. This is such a terrible idea in my opinion. There is no reason to install rubygems with a package manager because it can update itself. I will go over how to update rubygems later in this howto.

wget http://rubyforge.org/frs/download.php/35283/rubygems-1.1.1.tgz
tar -xzf rubygems-1.1.1.tgz
cd rubygems-1.1.1
ruby setup.rb

Optional: Once you are done with install just run the next three commands to make using gems and Rubygems just as before.

sudo ln -s /usr/bin/gem1.8 /usr/bin/gem
sudo ln -s /usr/bin/ruby1.8 /usr/bin/ruby
sudo ln -s /usr/bin/irb1.8 /usr/bin/irb

Recommended Gems

Here is a list of recommended gems that should be installed once rubygems is installed. At the very least you must install rails and passenger.

sudo gem install rails
sudo gem install capistrano
sudo gem install rspec
sudo gem install ultrasphinx
sudo gem install passenger

add type for html

March 18, 2008

RemoveHandler .html .htm

AddType application/x-httpd-php .html .htm

After adding this to your .htaccess .html files can now process .php commands.

GET URL variables

March 12, 2008

<?php

function setUrlVariables() {
$arg = array();
$string = “?”;
$vars = $_GET;
for ($i = 0; $i < func_num_args(); $i++)
$arg[func_get_arg($i)] = func_get_arg(++$i);
foreach (array_keys($arg) as $key)
$vars[$key] = $arg[$key];
foreach (array_keys($vars) as $key)
if ($vars[$key] != “”) $string.= $key . “=” . $vars[$key] . “&”;
if (SID != “” && SID != “SID” && $_GET["PHPSESSID"] == “”)
$string.= htmlspecialchars(SID) . “&”;

return htmlspecialchars(substr($string, 0, -1));
}

echo  setUrlVariables();

?>

PHP get current page

March 10, 2008

<?php
$currentFile = $_SERVER["PHP_SELF"];
$parts = Explode(‘/’, $currentFile);
echo $parts[count($parts) - 1];
?>

same as with

<?php
basename($_SERVER[’PHP_SELF’]);
?>

Experimental Game

December 5, 2007

http://imonline.alpha-phi-epsilon.net/_labs/game_01.html

hey guys… I donno why I made this freaking game, out of boredom, I made this game… The engine is not very throughly done… but its a worth trying for…

Flash MySQL XML

November 21, 2007

//Figure 1.0 Main.fla

var theXML:XML = new XML();
theXML.ignoreWhite = true;

theXML.onLoad = function() {
var nodes = this.firstChild.childNodes;
for(i=0;i<nodes.length;i++) {
theList.addItem(nodes[i].firstChild.nodeValue,i);
}
}

theXML.load(“http://www.yoursite.com/products.php”);

//Figure 1.1 products.php

<?PHP

$link = mysql_connect(“localhost”,”lee”,”password”);
mysql_select_db(“brimelow_store”);

$query = ‘SELECT * FROM products’;
$results = mysql_query($query);

echo “<?xml version=\”1.0\”?>\n”;
echo “<products>\n”;

while($line = mysql_fetch_assoc($results)) {
echo “<item>” . $line["product"] . “</item>\n”;
}

echo “</products>\n”;

mysql_close($link);

?>

Flash Preloader

November 21, 2007

loadedBytes = _root.getBytesLoaded();
totalBytes = _root.getBytesTotal();
if (loadedBytes<totalBytes) {
percentage = ((loadedBytes/totalBytes)*100);
loader_graphic._xscale = percentage;
txt = percentage;
gotoAndPlay(1);
} else {
nextScene();
}