A continuation to my previous post on getting links.
Internet Explorer is an application with a rich COM interface. And PowerShell can work with COM objects. Thus you can do the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# create a new COM object that links to IE $ie = New-Object -ComObject "InternetExplorer.Application" # sleep for two seconds while IE launches Start-Sleep -Seconds 2 # navigate to the page you want; note: the IE window is hidden so users won't see anything $ie.Navigate("http://URL/of/page") # sleep for two seconds while IE opens the page Start-Sleep -Seconds 2 # since it is an Object you can use methods and properties to get a list of links $ie.Document.Links | select href # quit IE $ie.Application.Quit() |
The Document
property is very powerful and lets you see a lot of details of the page. It has a subproperty Link
that gives all the link elements in the page (it has nearly 800 properties, methods, and events!). The output is as objects, and since we are only interested in the actual link href
elements we can select that property.
If you are PowerShell v3 things are even easier. There’s a cmdlet called Invoke-WebRequest
who is your friend.
To get an object representing the website do:
1 |
Invoke-WebRequest "http://url/of/page" |
To get all the links in that website:
1 |
Invoke-WebRequest "http://url/of/page" | select -ExpandProperty Links |
And to just get a list of the href elements:
1 |
Invoke-WebRequest "http://url/of/page" | select -ExpandProperty Links | select href |
Like the System.Net.Webclient
class Invoke-WebRequest
has parameters to specify proxy, headers, encoding, etc.