Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

Get-Content and Out-File in the same pipeline

When working with Get-Content and Out-File on the same file in a pipe line, the following is worth keeping in mind:

Notice how the file had contents initially but the action of reading the contents and writing back to the file erased everything?

My understanding (from page 187 of Bruce Payette’s “PowerShell in Action” book) is that this happens because the Out-File cmdlet runs before the Get-Content cmdlet and so the file is emptied out even before its contents can be displayed.

That doesn’t make much sense so here’s an attempt at trying to make some sense of it.

I think the first piece in the puzzle is that PowerShell pipelines run as a single process (from section 2.5 of the book). That is to say, while it seems like there will be two processes in the above pipeline in reality there is only one. The pipeline cmdlets are split into three clauses: BeginProcessing, ProcessRecord, and EndProcessing. Then the BeginProcessing clause is run for all cmdlets in the pipeline. Next the ProcessRecord clause is run for the first cmdlet; if an output is produced it is passed to the ProcessRecord clause of the second cmdlet and if an output is produced from the second cmdlet it’s passed on to the third cmdlet and so on. This happens for each of the output produced (by each cmdlet in the pipeline) and once all this completes the EndProcessing clauses are run for all cmdlets.

I couldn’t find much info on what the BeginProcessing, ProcessRecord, and EndProcessing clauses are (and not that I tried very hard either) but I did find documentation on the Cmdlet Class and learnt that this class defines three input processing methods, namely: BeginProcessing, ProcessRecord, and EndProcessing. All PowerShell cmdlets then override at least one of these methods if they are to process records.

So now I understand better what happens during a pipeline processing. A single process is created and that process (1) invokes the BeginProcessing methods of all the cmdlets in the pipeline; (2) then invokes the ProcessRecord method of each cmdlet in the order they are in the pipeline, passing the output from the a cmdlet that’s earlier in the pipeline to the one following it; and lastly (3) invokes the the EndProcessing methods of all cmdlets in the pipeline.

Next I tried to find more about these input processing modules in the context of the Get-Content and Out-File cmdlets. And I learnt that while Out-File overrides all three methods Get-Content only overrides two methods. The BeginProcessing method is not overridden by Get-Content.

Now it makes sense why the above pipeline behaves the way it does. Since Get-Content does not define a BeginProcessing method but Out-File does, the latter is run first and truncates the contents of the file and only then does Get-Content read the (now empty) file and pass it on.

Moral of the story: always read the contents of the file into a variable first and then pipe the variable to Out-File. Or read the contents of the file but pipe output to a different file.

As a variant of the above example, check this out:

This time the pipeline goes into a loop. Why?

I would have expected that (1) Out-File opens the file as usual but doesn’t delete the contents, (2) then Get-Content opens the file and outputs the contents to Out-File who in turn (3) appends it to the file, thus terminating the pipeline and leaving the file with the initial line repeated. But it goes into a loop instead and the initial line is repeated until I terminate the loop – not sure what’s happening here.

Update: I asked this question on StackOverflow and got an answer. Steps 1-3 are as I expected, with the difference that at step 3 instead of the pipeline being terminated (4) Get-Content views this as additional data in the file and so outputs that too – leading to steps 3 & 4 repeating over and over again. The important thing to keep in mind is that the operation isn’t sequential – I was thinking Get-Content outputs the first line and closes the file – but it does not. I suppose the file would have closed if there was a delay, but what happens is that Out-File writes to it side by side and so Get-Content views the appended line as part of the file and outputs that too. The two cmdlets run side by side.

Important to keep this in mind when dealing with pipelines.