The system.net.webclient
class can be used to deal with web pages.
To download and display pages this class has couple of methods:
DownloadData
downloads the page and displays it as an array of bytes.DownloadString
downloads the page and displays it as one long string.
12345# define a new objectPS> $web = New-Object system.net.webclient# download and return as a single stringPS> $web.DownloadString("https://rakhesh.com/about","about.html")DownloadFile
downloads the page and saves it to a file name you specify.
12345# define a new objectPS> $web = New-Object system.net.webclient# download to a filePS> $web.DownloadFile("https://rakhesh.com/about","about.html")
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# define a new object PS> $web = New-Object system.net.webclient # download an address PS> $web.DownloadData("https://rakhesh.com/about") 60 33 68 ... # just for kicks: this is how you can download the page as a byte array, cast each byte to a character # and join all these characters to get the entire page PS> ($web.DownloadData("https://rakhesh.com/about") | %{ [char]$_ }) -join "" <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html ... |
The class also has properties you can set to be used while downloading a page. For instance:
QueryString
to specify pairs of query parameters and their values. For example: to do a Google search for the word “rakhesh” one can fetch the pagehttp://www.google.com/search?q=rakhesh
. Thisq=rakhesh
is a query string, withq
being a parameter andrakhesh
being a value to the parameter. To do the same via thesystem.net.webclient
class one would do the following:12345678910111213141516171819# define a new objectPS> $web = New-Object system.net.webclient# check its current query stringPS> $web.QueryString.GetEnumerator()# add a new query stringPS> $web.QueryString.Add("q","rakhesh")# confirm addition - just to show it's possiblePS> $web.QueryString.GetEnumerator()qPS> $web.QueryString.Get("q")rakhesh# now fetch the web page to a filePS> $web.DownloadFile("http://www.google.com/search","results.html")# if you examine the results.html file you'll see it has done the searchHeaders
to specify pairs of headers that can be set when requesting the web page:1234PS> $web = New-Object system.net.webclient# set the user-agent string to "PowerShell Script"PS> $web.Headers.Add("user-agent", "PowerShell Script")Credentials
to specify credentials for accessing the web page:12PS> $web = New-Object system.net.webclientPS> $web.Credentials = Get-CredentialResponseHeaders
to view the headers received in response.123456789101112131415161718192021222324252627282930313233343536373839PS> $web = New-Object system.net.webclient# no response headers initiallyPS> $web.ResponseHeaders# visit a pagePS> $web.DownloadFile("https://rakhesh.com","temp.html")# now we have response headersPS> $web.ResponseHeadersConnectionKeep-AliveVaryAccept-RangesContent-LengthCache-ControlContent-TypeDateExpiresLast-ModifiedServer# view one of the headersPS> $web.ResponseHeaders.Get("Expires")Tue, 22 Oct 2013 12:35:13 GMT# use a pipe to view all the headersPS> $web.ResponseHeaders.GetEnumerator() | %{ "$_ -> $($web.ResponseHeaders.Get($_))" }Connection -> keep-aliveKeep-Alive -> timeout=10Vary -> Accept-Encoding,Accept-Encoding,CookieAccept-Ranges -> bytesContent-Length -> 464244Cache-Control -> max-age=3, must-revalidate, proxy-revalidateContent-Type -> text/html; charset=UTF-8Date -> Tue, 22 Oct 2013 12:35:10 GMTExpires -> Tue, 22 Oct 2013 12:35:13 GMTLast-Modified -> Mon, 21 Oct 2013 15:14:25 GMTServer -> Apache
There are other properties and methods too, the above are what I had a chance to look at today.