Skip to content

Improve file-diffing performance #3705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from
Closed

Improve file-diffing performance #3705

wants to merge 16 commits into from

Conversation

staabm
Copy link
Contributor

@staabm staabm commented Apr 29, 2023

different approach on fixing rectorphp/rector#7899

grafik

after this PR we..

  • generate less diff strings -> which is faster
  • transport less data between processes in parallel-mode -> which is faster

@staabm staabm force-pushed the diff branch 2 times, most recently from ff0f5e2 to 3bd89ce Compare April 29, 2023 10:09
@staabm
Copy link
Contributor Author

staabm commented Apr 29, 2023

I don't get why only e2e tests are failling

@TomasVotruba @samsonasik any idea?

@samsonasik
Copy link
Member

it probably related with due to --dry-run and --no-ansi usage?

$e2eCommand = 'php '. $rectorBin .' process --dry-run --no-ansi -a '. $autoloadFile . ' --clear-cache';

@samsonasik
Copy link
Member

@staabm you can try e2e test locally by run:

cd e2e/applied-rule-change-docblock 
composer install
php ../e2eTestRunner.php 
                                                                                                                        

@staabm
Copy link
Contributor Author

staabm commented Apr 29, 2023

the only missing thing is the "first-line-number" in the file diffs, see e.g. https://github.com/rectorphp/rector-src/actions/runs/4838521883/jobs/8623019152?pr=3705

before this PR we were able to always tell about this line number, because we calculated each file-diff in both modes - and file numbers are only available in the "default-differ".

I guess if we want this perf improvement, we need to give up on this "first-line" number indication in the diff in parallel mode.

wdyt?

@samsonasik
Copy link
Member

The line number in diff is a feature to ease jump to nearest changed code.

That's strange, I will look if I can do something

@samsonasik
Copy link
Member

Oh, it seems you fix it?

@staabm
Copy link
Contributor Author

staabm commented Apr 29, 2023

That's strange, I will look if I can do something

I think I found the glitch. lets see

@staabm
Copy link
Contributor Author

staabm commented Apr 29, 2023

ohh lol. we now have the first line numbers, but also some more line numbers which we probably don't want? :)

@staabm staabm marked this pull request as ready for review April 29, 2023 15:21
@staabm staabm requested a review from TomasVotruba as a code owner April 29, 2023 15:21
use Rector\Core\ValueObject\ProcessResult;
use Rector\Parallel\ValueObject\Bridge;

final class WorkerOutputFormatter implements OutputFormatterInterface
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have introduced this new formatter, so I can easily differentiate between a user requesting json format and a user requesting a different format, but rector is internally using json format while delegating to worker processes.

that was necessary for this PR to implement the pre-existing behaviour

@staabm
Copy link
Contributor Author

staabm commented Apr 29, 2023

@TomasVotruba do you think adding more detailed diff line numbers in diffs is a good thing and I should change test-expectations?

https://github.com/rectorphp/rector-src/actions/runs/4838587500/jobs/8623121975?pr=3705

@samsonasik
Copy link
Member

More detailed diff is ok for me 👍

@samsonasik
Copy link
Member

Hm..., not sure about DX perspective..., that probably cause ambiguous meaning when see the line change...

@@ -17,7 +17,7 @@ public function __construct(
) {
// @see https://github.com/sebastianbergmann/diff#strictunifieddiffoutputbuilder
// @see https://github.com/sebastianbergmann/diff/compare/4.0.4...5.0.0#diff-251edf56a6344c03fa264a4926b06c2cee43c25f66192d5f39ebee912b7442dc for upgrade
$unifiedDiffOutputBuilder = new UnifiedDiffOutputBuilder();
$unifiedDiffOutputBuilder = new UnifiedDiffOutputBuilder( "--- Original\n+++ New\n", true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staabm this may probably related of how original + new diff on Console formatter, see

if ($escapedDiffLine === '--- Original') {
unset($escapedDiffLines[$key]);
}
if ($escapedDiffLine === '+++ New') {
unset($escapedDiffLines[$key]);

Copy link
Contributor Author

@staabm staabm Apr 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this value is the same as the one this class used by default before this PR

@staabm staabm force-pushed the diff branch 2 times, most recently from 7189e31 to 7f9b152 Compare April 30, 2023 12:26
@staabm
Copy link
Contributor Author

staabm commented Apr 30, 2023

phpstan job requires #3716 to succeed

@staabm
Copy link
Contributor Author

staabm commented Apr 30, 2023

on a different workload the PR improves considerably (in comparison to e3c1d08)

grafik


even after all those PRs (this one included), diffing still dominates my profiles 🤷‍♂️.
from the remaining 4min 24secs diffing takes 1 minute.

grafik

@samsonasik
Copy link
Member

Worker command line test seems error, probably related of changed format from json to worker https://github.com/rectorphp/rector-src/actions/runs/4844443238/jobs/8632724411#step:5:40

@staabm staabm force-pushed the diff branch 2 times, most recently from b770e87 to d9adf30 Compare May 1, 2023 06:30
@staabm
Copy link
Contributor Author

staabm commented May 1, 2023

Worker command line test seems error

fixed in e933a6d


I have also adjusted the e2e test expectations in d9adf30

@@ -4,7 +4,7 @@
1) src/short_open_tag_with_import_name.php:0

---------- begin diff ----------
@@ @@
@@ -1,10 +1,12 @@
<?php
+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This somehow change the original expectation

+use PhpParser\Node\Stmt\Expression;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First stmt detection maybe changed, you may need to debug on

$currentStmt = current($file->getNewStmts());
if ($currentStmt instanceof FileWithoutNamespace && current($currentStmt->stmts) instanceof InlineHTML) {
return null;
}

@staabm
Copy link
Contributor Author

staabm commented May 1, 2023

I need to take a few more measure, but I think I was able to fix the underlying problem in sebastianbergmann/diff, which might make this PR here obsolete

@samsonasik
Copy link
Member

@staabm thank you, I think patch to sebastianbergmann/diff is the right approach 👍 , we can just need to wait for next sebastianbergmann/diff release

@staabm
Copy link
Contributor Author

staabm commented May 1, 2023

This PR is no longer necessary.

I wrote a blog post about it https://staabm.github.io/2023/05/01/diff-speeding.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants