Skip to content

Fix parsing here-docs without a carriage return after the terminator #72

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

guillaumeaubert
Copy link
Contributor

Diagnostic

PPI::Token::HereDoc notes that:

If the last line matches the terminator but is missing the newline, we want to allow it anyway (like perl itself does).

In the __TOKENIZER__on_char() subroutine, the parsing of this use case is handled by this conditional block, which requires $line to be defined:

if ( defined $line and $line eq $token->{_terminator} ) {

However, because of the condition in the while loop that precedes that block, $line will always be undefined after the while loop finishes:

while ( defined($line = $t->_get_line) ) {

The current code then considers the terminator without trailing newline to be part of the content of an unterminated here-doc, which is incorrect per the original comment in the code.

Pull Request

This pull request provides the following:

  • Use $token->{_heredoc}->[-1] instead of $line to detect terminators at the end of the file. $token->{_heredoc}->[-1] will always contain the last line of the file and this is what needs to be considered for parsing of the trailing terminator without carriage return afterwards.
  • Scope $line to the while() loop only, since it is not used outside of it anymore.
  • Remove the extra space added by PPI::Tokenizer at the end of the PPI document, when the here-doc block is at the end of the file. It was already correctly removed for unterminated here-docs, but it needs to be removed as well for here-docs with a proper terminator but without a carriage return at the end. This also allows the $token->{_heredoc}->[-1] eq $token->{_terminator} comparison to work properly.
  • A complete suite of tests for the different cases (terminators with a trailing return, terminators without a trailing return, incorrectly terminated here-docs) across the various types of here-doc blocks (single quoted, double quoted, not quoted, commands, escaped).

Thank you!

@moregan
Copy link
Collaborator

moregan commented Mar 24, 2014

These tests are a nice addition to the test suite. With them in place, the 'HEREDOC' block in t/ppi_token_operator.t can be removed.

An unterminated heredoc appears to actually a perl runtime abort, not a warning, which make the comment alluded to wrong. If it were just a warning, one of these two examples would execute their 'die':

perl -WE 'open( my $fh, ">", "heredoc.pl"); print $fh "my \$x=<<ASDF && die;\n1\nASDF";' && xxd heredoc.pl && perl heredoc.pl
0000000: 6d79 2024 783d 3c3c 4153 4446 2026 2620  my $x=<<ASDF &&
0000010: 6469 653b 0a31 0a41 5344 46              die;.1.ASDF
Can't find string terminator "ASDF" anywhere before EOF at heredoc.pl line 1.

perl -WE 'open( my $fh, ">", "heredoc.pl"); print $fh "my \$x=<<ASDF || die;\n1\nASDF";' && xxd heredoc.pl && perl heredoc.pl
0000000: 6d79 2024 783d 3c3c 4153 4446 207c 7c20  my $x=<<ASDF ||
0000010: 6469 653b 0a31 0a41 5344 46              die;.1.ASDF
Can't find string terminator "ASDF" anywhere before EOF at heredoc.pl line 1.

This makes me question whether PPI should create a HereDoc if the newline is missing from the heredoc terminator.

@guillaumeaubert
Copy link
Contributor Author

These tests are a nice addition to the test suite. With them in place, the 'HEREDOC' block in t/ppi_token_operator.t can be removed.

Indeed. I just pushed a678d80 accordingly.

An unterminated heredoc appears to actually a perl runtime abort, not a warning, which make the comment alluded to wrong.

Ah yes, I checked quickly before writing the patch but I guess a trailing return must have sneaked in when I was editing the file. Your examples are definitely conclusive.

This makes me question whether PPI should create a HereDoc if the newline is missing from the heredoc terminator.

This would open a bunch of follow-up questions:

  • How would the code then be represented, if a HereDoc object is not created?
  • The current code parses unterminated here-doc blocks into HereDoc objects, would this have to change as well?
  • Isn't this the purpose of having the _damaged property? To allow parsing damaged here-doc blocks (missing terminator or terminator without newline) where the intent of the programmer is clear?
  • Should the _damaged property be removed if those two cases aren't parsed as HereDoc objects anymore?

I'm happy to amend the pull request, just let me know.

@adamkennedy
Copy link
Collaborator

Any incomplete thing of any kind at the end of a file should, where
possible, be parsed to whatever is probably going to be competed later.

Assume you just started typing into an empty document. What will the final
token probably be in the future.

That's what you parse to.

Adam
On Mar 23, 2014 10:57 PM, "Guillaume Aubert" [email protected]
wrote:

These tests are a nice addition to the test suite. With them in place, the
'HEREDOC' block in t/ppi_token_operator.t can be removed.

Indeed. I just pushed a678d80a678d80accordingly.

An unterminated heredoc appears to actually a perl runtime abort, not a
warning, which make the comment alluded to wrong.

Ah yes, I checked quickly before writing the patch but I guess a trailing
return must have sneaked in when I was editing the file. Your examples are
definitely conclusive.

This makes me question whether PPI should create a HereDoc if the newline
is missing from the heredoc terminator.

This would open a bunch of follow-up questions:

  • How would the code then be represented, if a HereDoc object is not
    created?
  • The current code parses unterminated here-doc blocks into HereDoc
    objects, would this have to change as well?
  • Isn't this the purpose of having the _damaged property? To allow
    parsing damaged here-doc blocks (missing terminator or terminator without
    newline) where the intent of the programmer is clear?
  • Should the _damaged property be removed if those two cases aren't
    parsed as HereDoc objects anymore?

I'm happy to amend the pull request, just let me know.

Reply to this email directly or view it on GitHubhttps://github.com//pull/72#issuecomment-38414231
.

@guillaumeaubert
Copy link
Contributor Author

Thank you, @adamkennedy. I've just updated (in cc7ba81) the original comment to indicate that Perl will fail compilation (per @moregan's note) but left the pull request in its current form to achieve what you described. Let me know if you would like me to make any further modifications.

@guillaumeaubert
Copy link
Contributor Author

I'm happy to make any further modifications, just let me know.
Otherwise, merging this pull request would allow me to resolve guillaumeaubert/Perl-Critic-Policy-ValuesAndExpressions-PreventSQLInjection#17. Thank you!

@wchristian
Copy link
Member

Before i'll be able to release this, i must get out a proper release, the steps towards which i've taken in #81. Once that is done, i will address this pull request. :)

@guillaumeaubert
Copy link
Contributor Author

@wchristian - Thank you for the update, and good luck with the upcoming release! :)

@wchristian wchristian force-pushed the master branch 3 times, most recently from dcc718c to 04bd318 Compare November 1, 2014 20:08
@wchristian
Copy link
Member

Rebased on master and cleaned up. If everything goes well this is slated to be out in 1.222 in two to three weeks.

@wchristian wchristian closed this Nov 2, 2014
@wchristian wchristian modified the milestones: 1.202, 1.222 Nov 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants