Scraping password protected sites

Using (or providing) Microsoft.NET Classes
Post Reply
neeraj
Posts: 82
Joined: Wed Dec 02, 2009 12:10 am
Location: Ithaca, NY, USA

Scraping password protected sites

Post by neeraj »

How would you do this in Dyalog?

Code: Select all

__author__ = 'ngupta'
from bs4 import BeautifulSoup
import mechanize

LOGIN_URL = "https://www.schwab.com/"
LOGIN_FORM_NAME = "SignonForm"
LOGIN_USER_ID_FIELD = "SignonAccountNumber"
LOGIN_PASSWORD_FIELD = "SignonPassword"
"""Create browser"""
mech_br = mechanize.Browser()
mech_br.set_handle_robots(False)
mech_br.set_handle_refresh(False)
mech_br.addheaders = [('User-agent', 'Firefox')]

user_id="your_id"
password="your_pwd"
mech_br.open(LOGIN_URL)
mech_br.select_form(name=LOGIN_FORM_NAME)
mech_br[LOGIN_USER_ID_FIELD] = user_id
mech_br[LOGIN_PASSWORD_FIELD] = password
login_response = mech_br.submit()

soup = BeautifulSoup(login_response.read(),"html.parser")
table = soup.find("table", {"id": "tblCharlesSchwabBank"})
balance = float(table('tr')[1]('td')[2].span.text[1:])  # 2nd row, 3rd cell
print balance
neeraj
Posts: 82
Joined: Wed Dec 02, 2009 12:10 am
Location: Ithaca, NY, USA

Re: Scraping password protected sites

Post by neeraj »

RUNNING THE SCRIPT:

/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 "/Users/ngupta/Dropbox/python/pycharm projects/MechanizeTest/Test4.py"
698.53

Process finished with exit code 0
Vince|Dyalog
Posts: 439
Joined: Wed Oct 01, 2008 9:39 am

Re: Scraping password protected sites

Post by Vince|Dyalog »

Hi Neeraj,

I would suggest searching for the internet for "c# web scrape login" and then translating c# examples into APL using our .NET interface.

Regards,

Vince
User avatar
PGilbert
Posts: 440
Joined: Sun Dec 13, 2009 8:46 pm
Location: Montréal, Québec, Canada

Re: Scraping password protected sites

Post by PGilbert »

Based on the suggestion of Vince and this web page: http://webdata-scraping.com/login-website-programmatically-using-c-web-scraping/ you can do the following in .Net:

Code: Select all

 url←'https://www.schwab.com/'

 ⎕USING←'System.Windows.Forms,System.Windows.Forms.dll'
 ⎕USING,←⊂'System.Drawing,System.Drawing.dll'

 wb←⎕NEW WebBrowser
 wb.Dock←wb.Dock.Fill
 wb.Navigate(⊂url)
 ⎕DL 5
 htmlDoc←wb.Document
 html←⎕UCS wb.DocumentStream.ToArray

 signonAcc←htmlDoc.GetElementById(⊂'SignonAccountNumber')
⍝ signonAcc.InnerText←'user_id' ⍝ No error but property is not changed
 signonAcc.InnerHtml←'user_id'

 signonPwd←htmlDoc.GetElementById(⊂'SignonPassword')
⍝ signonPwd.InnerText←'password' ⍝ No error but property is not changed
 signonPwd.InnerHtml←'password'

 loginBtn←htmlDoc.GetElementById(⊂'&lid=Log in')
 loginBtn.InvokeMember(⊂'click')

 ⍝ Show the WebBrowser in a WindowsForm
 fm←⎕NEW Form
 fm.Size←⎕NEW Size(1100,680)
 fm.Text←'URL [ ',url,' ]'
 fm.onClosed←'_GetWebResults_onClosed'
 fm.Controls.Add wb

 fm.Show ⍬

and for the onClosed event function:

Code: Select all

 _GetWebResults_onClosed(sender event)

 (⌷sender.Controls).Dispose


This is working code that is not bugging but you will have to try it with your ID and Password. 'htmlDoc' is a System.Windows.Forms.HtmlDocument that you can interrogate easily with .GetElementById or .GetElementsByTagName . You find those ID and TagName by inspecting manually the html of the page or if you use Safari you can right click on an element of the page and on the contextual menu you choose 'Inspect Element' and it will show you the HTML of that element and finds its ID more easily. Sometimes you may need to put ⌷ or ⍬⍴⌷ in front of the result of .GetElementById or .GetElementsByTagName to get it in the proper rank.

Good luck.
neeraj
Posts: 82
Joined: Wed Dec 02, 2009 12:10 am
Location: Ithaca, NY, USA

Re: Scraping password protected sites

Post by neeraj »

Thanks to both of you. I will try and see how it works out.
Post Reply