-
Notifications
You must be signed in to change notification settings - Fork 452
Qonnx binary quant #1292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jurevreca12
wants to merge
15
commits into
fastmachinelearning:main
Choose a base branch
from
jurevreca12:qonnx_binary_quant
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Qonnx binary quant #1292
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
d093913
Added support for BipolarQuant. Its converted to BinaryQuant in hls4ml.
4b66180
Added binarized qonnx model for testing the binary quant transformation
721d598
Pre-commit fixes
768c6a9
Merge branch 'main' into qonnx_binary_quant
jurevreca12 6a74bfb
Removed BipolarQuantConstantParameters, because such an optimization …
10e1af0
Merge branch 'qonnx_binary_quant' of https://github.com/jurevreca12/h…
2dfdb25
Limited FuseBipolarQuantWithConstant to only support scale factors of 1
89e2136
Removed bipolar_quant_constant_parameters from list of optimizations,…
76968b5
Modified the optimizations to only consider transform when scaling fa…
8a20361
Removed left-over docs from copying.
7bd4d94
Revert "Removed BipolarQuantConstantParameters"
144d427
Revert "Removed bipolar_quant_constant_parameters from list of optimi…
8d6aae2
Removed onnx model form repo. Using example-models for that instead.
18ce38e
Added test for non-unit (po2) scaling factors
08dafdf
Pre-commit fixes.
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,3 +14,4 @@ docs/autodoc/* | |
hls4mlprj_* | ||
*~ | ||
*.ipynb_checkpoints/ | ||
*.bak |
Submodule example-models
updated
from c6bb3c to e7a9de
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
""" | ||
This file includes optimizations related to BipolarQuant nodes. | ||
|
||
""" | ||
|
||
import numpy as np | ||
|
||
from hls4ml.model.layers import Activation, BipolarQuant, Constant | ||
from hls4ml.model.optimizer import OptimizerPass | ||
from hls4ml.model.quantizers import BinaryQuantizer | ||
from hls4ml.model.types import XnorPrecisionType | ||
|
||
|
||
class BipolarQuantConstantParameters(OptimizerPass): | ||
"""Remove Constant from the BipolarQaunt node parameters (but not input[0])""" | ||
|
||
def match(self, node): | ||
is_match = ( | ||
isinstance(node, BipolarQuant) | ||
and len(node.inputs) == 2 | ||
and (node.get_input_node(node.inputs[1]) and isinstance(node.get_input_node(node.inputs[1]), Constant)) | ||
) | ||
|
||
return is_match | ||
|
||
def transform(self, model, node): | ||
""" | ||
Remove Constant from the BipolarQuant node parameters (but not input[0]) | ||
""" | ||
if node.get_input_node(node.inputs[1]): | ||
scale_node = node.get_input_node(node.inputs[1]) | ||
if isinstance(scale_node, Constant): | ||
node.set_attr('scale', scale_node.get_attr('value')) | ||
node.inputs[1] = '' | ||
model.remove_node(scale_node) | ||
|
||
node.inputs = [inp for inp in node.inputs if inp] | ||
if len(node.inputs) != 1: | ||
raise RuntimeError("hls4ml only supports constant scale") | ||
|
||
return True | ||
|
||
|
||
class BipolarQuantToActivation(OptimizerPass): | ||
""" | ||
This is for the case when scale is po2. | ||
It is a a 1:1 transformation of a BipolarQuant to an Activation. | ||
As an optimization, this is not called when the input is constant. | ||
""" | ||
|
||
def match(self, node): | ||
# only matches after the other inputs are already folded | ||
is_match = ( | ||
isinstance(node, BipolarQuant) | ||
and len(node.inputs) == 1 | ||
and not isinstance(node.get_input_node(node.inputs[0]), Constant) | ||
) | ||
|
||
# Only match if the scale is po2 | ||
if is_match: # to make sure this is a quant node with inputs | ||
scale = node.get_attr('scale') | ||
scale_unit_or_po2 = (scale == 1.0).all() | ||
# This optimization only works if all scales are the same | ||
if np.all(scale[0] == scale): | ||
mantissa, _ = np.frexp(scale[0]) | ||
scale_unit_or_po2 = mantissa == 0.5 | ||
is_match = scale_unit_or_po2 | ||
|
||
return is_match | ||
|
||
def transform(self, model, node): | ||
""" | ||
Change BipolarQuant node to Activation | ||
""" | ||
precision = XnorPrecisionType() | ||
quantizer = BinaryQuantizer(bits=1) | ||
|
||
attributes = {'activation': 'linear', 'quantizer': quantizer} | ||
|
||
# update the configuration | ||
config = model.config.get_layer_config(node) | ||
prec_config = config.setdefault('Precision', {}) | ||
prec_config['result'] = str(precision) | ||
new_name = f'{node.name}_act' | ||
model.config.set_name_config(new_name, config) | ||
model.config.parse_name_config(new_name, config) | ||
print(f"Node {new_name} inputs: {[node.inputs[0]]}, outputs: {list(node.outputs)}.") | ||
new_node = model.make_node(Activation, new_name, attributes, [node.inputs[0]], list(node.outputs)) | ||
model.replace_node(node, new_node) | ||
return True | ||
|
||
|
||
class FuseBipolarQuantWithConstant(OptimizerPass): | ||
""" | ||
This is for the case when scale is po2. | ||
""" | ||
|
||
def match(self, node): | ||
|
||
# only matches after the other inputs are already folded | ||
# and scale is unit | ||
is_match = ( | ||
isinstance(node, BipolarQuant) | ||
and len(node.inputs) == 1 | ||
and isinstance(node.get_input_node(node.inputs[0]), Constant) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here, does this really work if scale != 1? If it doesn't, the matching criteria should change. |
||
) | ||
|
||
# Only match if the scale is po2 | ||
if is_match: # to make sure this is a quant node with inputs | ||
scale = node.get_attr('scale') | ||
scale_unit_or_po2 = (scale == 1.0).all() | ||
# This optimization only works if all scales are the same | ||
if np.all(scale[0] == scale): | ||
mantissa, _ = np.frexp(scale[0]) | ||
scale_unit_or_po2 = mantissa == 0.5 | ||
is_match = scale_unit_or_po2 | ||
|
||
return is_match | ||
|
||
def transform(self, model, node): | ||
""" | ||
Fuse BipolarQuant with Constant. | ||
""" | ||
precision = XnorPrecisionType() | ||
quantizer = BinaryQuantizer(bits=1) | ||
|
||
const_node = node.get_input_node(node.inputs[0]) | ||
const_node.set_attr('quantizer', quantizer) | ||
const_node.get_output_variable().type.precision = precision | ||
|
||
# remove the Quant node | ||
model.remove_node(node) | ||
return True |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to handle to case when scale != 1. Ideally we should be able to extract ApplyAlpha scales in such a case that we propagate up and down. I think basic support can be fairly straightforwadly added, in the style of the Quant support. (If we don't support scale != 1, we should catch those cases and exit gracefully, with an error message.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the the BinaryQuantizer code and it does not define a scaling factor. Meaning that this can only work for scale factors 1. Further more this whole optimizer pass becomes irrelevant. So I will delete it.
What does ApplyAlpha do? I am not familiar with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a quant layer with a scale and/or zero offset really means scale/shift, then quantize, then unscale/unshift. The ApplyAlpha are scale and shift layers in the hls4ml IR. When a quant node is applied to a weight, the initial scaling/shifting can actually be done to the weights (assuming they are constant and not update able). Otherwise, the hope is that the scaling and unscaling can be moved around the graph to where the implementation is easiest. There are optimizers that already exist for that.