How to crawl web site

Question

How to crawl web site

msaus Aug '17

Created Aug '17	Last Reply Aug '17	Replies 4	Views 461	Votes 0

msaus 2.2k

Aug '17

Hello there,

Is it possible to crawl web site by phalcon library?

Izo
85.5k

Accepted
answer

edited Aug '17
Aug '17

you dont need MVC framework for that. Casper js / phantom js ( old school ) or https://github.com/facebook/php-webdriver + selenium

it does require small amount of different type of skills until you set it up , configure etc.. but once used to it , it quite simple.

have fun and dont get cough :-)

msaus · Answer 1 · 2017-08-04T02:47:44-07:00

msaus
2.2k

Aug '17

I decided to use fabpot/goutte. But, thank for your comment.

Izo · Answer 2 · 2017-08-04T04:28:59-07:00

Izo
85.5k

Aug '17

keep in mind js doesnt work in those crawers, things like ajax laoded data wont be crawable

ChangePlaces · Answer 3 · 2017-08-04T10:30:16-07:00

ChangePlaces
10.4k

Aug '17

don;t forget wget. it has a mirror option

Peter · Answer 4 · 2017-08-05T00:26:39-07:00

HTTrack records a whole Website to disk so you can crawl it offline. Good for before/after snapshots. You could then use something like Meld to see the differences.

The follow on question is: why do you want to crawl the Web site? That will determine the level of detail you will need.