Get a list of links in a web page (part 1)

Using the System.Net.Webclient class and using old-fashioned regexp to cull out links:

# create a net.webclient object
$web = New-Object system.net.webclient

# download the page as a string
# split the string wherever you have <a followed by spaces (the link tag basically)
# the result of the split is an array; pipe this through a foreach-object block
# and match each element with the regexp that ferrets out URLs, and output the matched bit
($web.downloadstring("http://URL/of/page") -split "<a\s+") | %{ [void]($_ -match "^href=[`'`"]([^`'`">\s]*)"); $matches[1] }

# create a net.webclient object

$web = New-Object system.net.webclient

# download the page as a string

# split the string wherever you have <a followed by spaces (the link tag basically)

# the result of the split is an array; pipe this through a foreach-object block

# and match each element with the regexp that ferrets out URLs, and output the matched bit

($web.downloadstring("http://URL/of/page") -split "<a\s+") | %{ [void]($_ -match "^href=[`'`"]([^`'`">\s]*)"); $matches[1] }