Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

Get a list of links in a web page (part 2)

A continuation to my previous post on getting links.

Internet Explorer is an application with a rich COM interface. And PowerShell can work with COM objects. Thus you can do the following:

The Document property is very powerful and lets you see a lot of details of the page. It has a subproperty Link that gives all the link elements in the page (it has nearly 800 properties, methods, and events!). The output is as objects, and since we are only interested in the actual link href elements we can select that property.

If you are PowerShell v3 things are even easier. There’s a cmdlet called Invoke-WebRequest who is your friend.

To get an object representing the website do:

To get all the links in that website:

And to just get a list of the href elements:

Like the System.Net.Webclient class Invoke-WebRequest has parameters to specify proxy, headers, encoding, etc.

Get a list of links in a web page (part 1)

Using the System.Net.Webclient class and using old-fashioned regexp to cull out links: