znazz75.de IT-Blog

Automatically scrape certain content on a website

I use “symfony/panther” to achieve the goal to scrape websites by the use of logic patterns.

Install the environment

Login into linux. Install “composer” to install “symfony/panther” and some needed packages:

Make a directory in your home folder or elsewhere for your script and install “symfony/panther” and “symfony/css-selector” as user. You don’t need to install as development if asked.

Use of panther with examples

Create a php file, I called it scrap.php, inside the directory you just created. You can use my following example code. You can execute the file on the command line with:

Example code:

Here is a example how to scrap a popup window from a table entry with pages:

Example to get the desired informations out of the files:

Problems

If you got a error like this in the /var/log/apache2/error.log file:

You can’t start the script with a browser. You have to execute your script on the command line.

If you get this error:

Try to execute the code as non root.