Skip to content

Find code duplication, standard libraries and inlined functions #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
7i opened this issue Jun 15, 2017 · 0 comments
Open

Find code duplication, standard libraries and inlined functions #2

7i opened this issue Jun 15, 2017 · 0 comments

Comments

@7i
Copy link
Member

7i commented Jun 15, 2017

To find reused code, like in-lined functions or partially duplicated functions, we could normalize basic blocks and save an anonymous representation of it so we later can identify matching basic blocks in a control flow graph. This representation can be a list of instruction names eg. "add phi div icmp"

First we could create a data-dependence graph (find all instructions that depend on other instructions) to find possible orderings inside a basic block.
We also have to identify SubBlock orderings too (using the term "SubBlock" with lack of a better word to refer to a number of instructions that have a dependency chain connecting them).
Perhaps we can use something from current instruction scheduling algorithms?

Example of SubBlock's:
basic block:

1,2,3,4,5,6,7,8,9 // 1-9 represent 9 instructions

Found dependencies:

[3 [1][2]    ]	// to run 3 we need to have executed 1 and 2 before
[5 [3][4]    ]  // to run 5 we need to have executed 3 and 4 before 
[9 [6][7][8] ]  // to run 9 we need to have executed 6, 7 and 8 before 

Note that 1 and 2 can be executed in any order, same is true for:
3 and 4
6, 7 and 8
but also the SubBlock 1 to 5 can be executed before or after the SubBlock 6 to 9

To normalize this we follow predefined rules.
Example:

  • SubBlock with least number of instructions first.
  • If two SubBlocks have same number of instructions take alphabetical name of first non-matching instruction 'a' first and 'z' last
  • Instructions alphabetical name 'a' first and 'z' last
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant