Page 1 of 1
Receiving HTML from AJAX appl
Posted: Mon Sep 27, 2010 9:40 pm
by nah
I have run an application that picked and unpacked HTML from a web page.
Now, the web page has been redesigned, so now it uses AJAX.
This means that the HTML is no longer updated.
Has anyone here experience on this subject?
Best regards
Niels
Re: Receiving HTML from AJAX appl
Posted: Tue Oct 05, 2010 10:20 am
by Morten|Dyalog
I don't think there is an easy solution to this, as the web page is now expecting some client code to run in the web browser. Writing your own JavaScript interpreter (or similar) is probably more work that you would like to do :-).
You might be able to get away with using a tool like "Fiddler" to spy on the HTTP communication and see what the AJAX client-side sends to the server and reverse engineer that - this MIGHT give you the information that you need, depending on what is going on. But this is probably a very long shot.
Can you let us know which page you are trying to "scrape"?
Re: Receiving HTML from AJAX appl
Posted: Tue Oct 05, 2010 8:13 pm
by nah
A nice example is "http://www.soccerway.com/national/sweden/allsvenskan/2010/regular-season/",
delivering Swedish soccer results. When I study the source to the shown page I can extract the content,
but after pressing "Previous" I see the previous results on the screen but the source is not updated.
Re: Receiving HTML from AJAX appl
Posted: Wed Oct 06, 2010 7:40 am
by harsman
That the page is using AJAX means it is retrieving data from the server in a more data oriented format than HTML, usually JSON or XML. This might actually make it easier to extract data compared to scraping it from HTML.
If you look at the Javascript source or watch network traffic (either via an external tool like Fiddler that Morten suggested, or with a browser integrated tool like Firebug for Firefox), you should be able to reverse engineer what HTTP-requests to make to get the data.
Re: Receiving HTML from AJAX appl
Posted: Thu Oct 07, 2010 6:00 pm
by alexbalako
Niels,
You may try to use Internet explorer ActiveX control which will execute java script for you on a page.
Than pool HTML from it.
Re: Receiving HTML from AJAX appl
Posted: Thu Nov 03, 2011 3:52 pm
by Dick Bowman
Have there been any further developments on this topic in the past year?
I find myself in a similar situation - a little application that page-scraped HTML now broken because the site author (British Met Office) now generates the pages seen in the browser with JavaScript. Obviously (?) the data I want to bring into APL is reaching my computer, but the browser seems to hide it from me.
Any specific suggestions about tools to look at? I'm not sure whether the last post is talking about general principles or something specific.
Re: Receiving HTML from AJAX appl
Posted: Thu Nov 03, 2011 4:12 pm
by Morten|Dyalog
Dick Bowman wrote:Have there been any further developments on this topic in the past year?
Not directly, but the MiServer team has a prototype of a tool to encode and decode JSON, that will be used for AJAX-style interaction with MiServer applications.
However, unless the data supplier documents the format of the required HTTP transactions, the only "solution" for the problem extracting data from web applications which use AJAX is to snoop on the communication between the Javascript application running in the browser and the server, and use Conga to send a similar request to the server, and either ⎕XML or the JSON-decoding tools (or something else, depending on the format) to take the result apart.
Re: Receiving HTML from AJAX appl
Posted: Wed Nov 16, 2011 3:05 pm
by Dick Bowman
Quick update to confirm that this thread has shown me what I needed to do...
0⊃ Firebug revealed that the Javascript was pulling files with the .json extension from the distant server
1⊃ Put together a quick/dirty decoder for the .json files
Which has put the broken part of the application back into action.
Thanks to all.