-
Notifications
You must be signed in to change notification settings - Fork 452
Qonnx binary quant #1292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Qonnx binary quant #1292
Conversation
test/pytest/bnn_model_fc_1layer.onnx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is a good place for an onnx model. I wonder if some of the models from the model zoo will do. If not, we should put this in https://github.com/fastmachinelearning/example-models, where we keep other inputs. (Note that that project is instantiated as a submodule.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, this is the qonnx model zoo that I am referring to: https://github.com/fastmachinelearning/qonnx_model_zoo. It does have some binary models, though I am not sure if they are suitable for the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it should not be in this repository. However, It is kind of an artificial model, so I am not sure if it should be in the model zoo. I prefer using these kind of artificial models, because they can be generated quickly, and they are very simple. That way if something goes wrong its a very small model, and its easier to find the issue.
Would you be open to adding Brevitas to the testing dependencies? That way we could auto-generate various quantized models as in chisel4ml.
Another option would be to use one of the models from the zoo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example-models repository is a good place for not real models that are used for testing.
|
||
class BipolarQuantToActivation(OptimizerPass): | ||
""" | ||
This is for the case when scale is a (positive) power of 2 and zeropt is 0. It is a a 1:1 transformation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really work if the scale is a power of 2? It's fine if it doesn't but if it doesn't we should change the matching criteria.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure. I'll add a test case to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not work for scale != 1. Since BinaryQuantizer does not have a scaling factor. One option would be to lower BipolarQuant with scaling factor != 1 to a activation node followed by a mul node. However, I am not sure if there is much sense in this? Since these mul nodes are, I presume, not implementable? On the other hand, perhaps they could be absorbed into other nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multiplier nodes are the ApplyAlpha nodes I mention in another comment. The ApplyAlpha is just a shift and scale layer, and you can use it as just a scale. (The name comes from it's original application, and it is not very good. We have talked about changing the name.) Fundamentally it's implmented the same way as a batchnorm (since we don't actually update the scaling in a batchnorm).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some tests with non-unit po2 scale factors, and it appears that the transformations work as is. I tried setting the scaling factors bellow 1 and above 1. And both cases worked. Perhaps some down stream transformation is handling this case?
is_match = ( | ||
isinstance(node, BipolarQuant) | ||
and len(node.inputs) == 1 | ||
and isinstance(node.get_input_node(node.inputs[0]), Constant) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, does this really work if scale != 1? If it doesn't, the matching criteria should change.
if node.get_input_node(node.inputs[1]): | ||
scale_node = node.get_input_node(node.inputs[1]) | ||
if isinstance(scale_node, Constant): | ||
node.set_attr('scale', scale_node.get_attr('value')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to handle to case when scale != 1. Ideally we should be able to extract ApplyAlpha scales in such a case that we propagate up and down. I think basic support can be fairly straightforwadly added, in the style of the Quant support. (If we don't support scale != 1, we should catch those cases and exit gracefully, with an error message.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the the BinaryQuantizer code and it does not define a scaling factor. Meaning that this can only work for scale factors 1. Further more this whole optimizer pass becomes irrelevant. So I will delete it.
What does ApplyAlpha do? I am not familiar with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a quant layer with a scale and/or zero offset really means scale/shift, then quantize, then unscale/unshift. The ApplyAlpha are scale and shift layers in the hls4ml IR. When a quant node is applied to a weight, the initial scaling/shifting can actually be done to the weights (assuming they are constant and not update able). Otherwise, the hope is that the scaling and unscaling can be moved around the graph to where the implementation is easiest. There are optimizers that already exist for that.
…is not applicable for BinaryQuantizer since it does not have a scaling factor
…ls4ml into qonnx_binary_quant
… since it was already removed.
Description
Type of change
Tests
Test Configuration:
Checklist
pre-commit
on the files I edited or added.