-
Notifications
You must be signed in to change notification settings - Fork 416
[TestCoverage] Failing Titanium Benchmarks for S10 Arch #3105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@vaughnbetz FYI |
@SamuelHo10 : I'm wondering if the matrixmult one could be related to the libarchfpga refactoring (from the stack trace it looks like it was in that region). Perhaps it has two clocks on a block or something else that is exercising some code that isn't exercised too often. @amin1377 @kimiatkh : wondering if the initial placement failures were always there with auto-sized devices, or if they are new. I think @amin1377 has modified the initial placement code. |
@vaughnbetz: Just a quick note: |
Got it, thanks @amin1377 |
@SamuelHo10 Sorry to steal your job, but I just happened to be looking into this issue and I think I found what was wrong with Turning on assert and debug I found the following: ![]() The assert that is being tripped is actually my assert. When I updated the Logical Models I added extra asserts to ensure that we never create / use models which do not exist. This appears to have revealed an issue with this blif file. The code that this is failing at is dead simple: ![]() It just looks up a sub-circuit by the given name and tries to get information on that model. The code here assumes that the model must have already been created; as such its an assert within the logical models class. The stack trace is telling us the name of the sub-circuit that is being looked up: From the name, I think this is a type of quad-port ram. The reason this is crashing is because the stratix-10 architecture does not have quad-port RAMs (at least according to the architecture file). The issue, is that for some reason the matrix mult blif file is trying to use a quad_port RAM. I used some Linux magic to parse all of the blif files in Titanium, and ![]() @vaughnbetz I believe that this is an issue with the benchmark itself, not VTR. Is there ever a case where the blif file uses models which are not described in the architecture? My theory about what happened here is that the old code may have just ignoring these models and was performing undefined behaviour; but honestly I have no idea why this blif file would ever work... Just to cover my bases, I did re-download the Titanium benchmarks from scratch and confirmed that matrix_mult is using these quad-port models which do not exist in the S10 architecture. @vaughnbetz What should we do about this? |
@AlexandreSinger When I was writing the Stratix 10 design, I didn't add support for the quad-port RAMs since they were barely used at the time. However, for the VTR 9 paper, I updated the architecture file to include support for quad-port RAMs as well. I'll try to create a pull request with the added support by next week. |
Nice investigation Alex and thanks Kimia. |
While working on adding the Titanium benchmarks to the CI, I found that 5 circuits are currently failing through the default VTR flow:
ChainNN_LRN_LG
ChainNN_ELT_LG
ChainNN_BSC_LG
pricing
matrix_mult
I commented these 5 circuits out of the NightlyTest7 tasks for now, but these benchmarks should be revived.
See #3104 for context.
Initial Placement Issues
The first four benchmarks are all failing in the initial placer:
The problem seems to be coming from how the initial placer handles (or does not handle) logical blocks which can be placed in multiple physical tile types:
What these 4 circuits have in common is that they all have a lot of logical blocks which can be either LABs or MLABs; however they are very constrained in either physical LABs or physical MLABs.
This can be resolved by either having the auto-device sizer recognize this and increase the size of the device or by updating the initial placer to be more aware of this pattern and try to place the logical blocks which MUST be either a LAB or MLAB before placing the logical blocks which can be either.
Circuit Loading Issues
The last circuit,
matrix_mult
is failing while loading the circuit:This benchmark gets a segmentation fault during loading.
The text was updated successfully, but these errors were encountered: