Scraping password protected sites

neeraj · Post by **neeraj** » Fri Jul 24, 2015 6:12 am

How would you do this in Dyalog?

__author__ = 'ngupta'
from bs4 import BeautifulSoup
import mechanize

LOGIN_URL = "https://www.schwab.com/"
LOGIN_FORM_NAME = "SignonForm"
LOGIN_USER_ID_FIELD = "SignonAccountNumber"
LOGIN_PASSWORD_FIELD = "SignonPassword"
"""Create browser"""
mech_br = mechanize.Browser()
mech_br.set_handle_robots(False)
mech_br.set_handle_refresh(False)
mech_br.addheaders = [('User-agent', 'Firefox')]

user_id="your_id"
password="your_pwd"
mech_br.open(LOGIN_URL)
mech_br.select_form(name=LOGIN_FORM_NAME)
mech_br[LOGIN_USER_ID_FIELD] = user_id
mech_br[LOGIN_PASSWORD_FIELD] = password
login_response = mech_br.submit()

soup = BeautifulSoup(login_response.read(),"html.parser")
table = soup.find("table", {"id": "tblCharlesSchwabBank"})
balance = float(table('tr')[1]('td')[2].span.text[1:])  # 2nd row, 3rd cell
print balance

neeraj · Post by **neeraj** » Fri Jul 24, 2015 6:17 am

RUNNING THE SCRIPT:

/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 "/Users/ngupta/Dropbox/python/pycharm projects/MechanizeTest/Test4.py"
698.53

Process finished with exit code 0

Vince|Dyalog · Post by **Vince|Dyalog** » Tue Jul 28, 2015 11:14 am

Hi Neeraj,

I would suggest searching for the internet for "c# web scrape login" and then translating c# examples into APL using our .NET interface.

Regards,

Vince

PGilbert · Post by **PGilbert** » Tue Jul 28, 2015 3:23 pm

Based on the suggestion of Vince and this web page: http://webdata-scraping.com/login-website-programmatically-using-c-web-scraping/ you can do the following in .Net:

Code: Select all

 url←'https://www.schwab.com/'

 ⎕USING←'System.Windows.Forms,System.Windows.Forms.dll'
 ⎕USING,←⊂'System.Drawing,System.Drawing.dll'

 wb←⎕NEW WebBrowser
 wb.Dock←wb.Dock.Fill
 wb.Navigate(⊂url)
 ⎕DL 5
 htmlDoc←wb.Document
 html←⎕UCS wb.DocumentStream.ToArray

 signonAcc←htmlDoc.GetElementById(⊂'SignonAccountNumber')
⍝ signonAcc.InnerText←'user_id' ⍝ No error but property is not changed
 signonAcc.InnerHtml←'user_id'

 signonPwd←htmlDoc.GetElementById(⊂'SignonPassword')
⍝ signonPwd.InnerText←'password' ⍝ No error but property is not changed
 signonPwd.InnerHtml←'password'

 loginBtn←htmlDoc.GetElementById(⊂'&lid=Log in')
 loginBtn.InvokeMember(⊂'click')

 ⍝ Show the WebBrowser in a WindowsForm
 fm←⎕NEW Form
 fm.Size←⎕NEW Size(1100,680)
 fm.Text←'URL [ ',url,' ]'
 fm.onClosed←'_GetWebResults_onClosed'
 fm.Controls.Add wb

 fm.Show ⍬

and for the onClosed event function:

Code: Select all

 _GetWebResults_onClosed(sender event)

 (⌷sender.Controls).Dispose

This is working code that is not bugging but you will have to try it with your ID and Password. 'htmlDoc' is a System.Windows.Forms.HtmlDocument that you can interrogate easily with .GetElementById or .GetElementsByTagName . You find those ID and TagName by inspecting manually the html of the page or if you use Safari you can right click on an element of the page and on the contextual menu you choose 'Inspect Element' and it will show you the HTML of that element and finds its ID more easily. Sometimes you may need to put ⌷ or ⍬⍴⌷ in front of the result of .GetElementById or .GetElementsByTagName to get it in the proper rank.

Good luck.

neeraj · Post by **neeraj** » Thu Jul 30, 2015 4:09 am

Thanks to both of you. I will try and see how it works out.

Dyalog Forums

Scraping password protected sites

Scraping password protected sites

Re: Scraping password protected sites

Re: Scraping password protected sites

Re: Scraping password protected sites

Re: Scraping password protected sites