[TestCoverage] Failing Titanium Benchmarks for S10 Arch #3105

AlexandreSinger · 2025-06-04T01:13:59Z

While working on adding the Titanium benchmarks to the CI, I found that 5 circuits are currently failing through the default VTR flow:

ChainNN_LRN_LG
ChainNN_ELT_LG
ChainNN_BSC_LG
pricing
matrix_mult

I commented these 5 circuits out of the NightlyTest7 tasks for now, but these benchmarks should be revived.

See #3104 for context.

Initial Placement Issues

The first four benchmarks are all failing in the initial placer:

The problem seems to be coming from how the initial placer handles (or does not handle) logical blocks which can be placed in multiple physical tile types:

What these 4 circuits have in common is that they all have a lot of logical blocks which can be either LABs or MLABs; however they are very constrained in either physical LABs or physical MLABs.

This can be resolved by either having the auto-device sizer recognize this and increase the size of the device or by updating the initial placer to be more aware of this pattern and try to place the logical blocks which MUST be either a LAB or MLAB before placing the logical blocks which can be either.

Circuit Loading Issues

The last circuit, matrix_mult is failing while loading the circuit:

This benchmark gets a segmentation fault during loading.

The text was updated successfully, but these errors were encountered:

AlexandreSinger · 2025-06-04T01:14:08Z

@vaughnbetz FYI

vaughnbetz · 2025-06-04T01:28:59Z

@SamuelHo10 : I'm wondering if the matrixmult one could be related to the libarchfpga refactoring (from the stack trace it looks like it was in that region). Perhaps it has two clocks on a block or something else that is exercising some code that isn't exercised too often.

@amin1377 @kimiatkh : wondering if the initial placement failures were always there with auto-sized devices, or if they are new. I think @amin1377 has modified the initial placement code.
Probably we should have a strong ordering if we're nearly out of some kind of block that is competing for tiles with another block type. We could modify the initial placement gain function to prioritize that, either all the time or in the second and beyond placement attempt.

amin1377 · 2025-06-04T01:38:27Z

@vaughnbetz: Just a quick note:
the change I made to initial placement hasn't been merged yet, as it degraded results on large benchmarks. We discussed this a while ago, and it’s likely due to blocks being placed too close together initially, which prevents other connected blocks from being placed nearby. I still need to try out the approach you suggested to see if it improves the QoR. The related PR is #3018.

vaughnbetz · 2025-06-04T01:41:53Z

Got it, thanks @amin1377

AlexandreSinger · 2025-06-05T18:33:17Z

@SamuelHo10 Sorry to steal your job, but I just happened to be looking into this issue and I think I found what was wrong with matrix_mult.

Turning on assert and debug I found the following:

The assert that is being tripped is actually my assert. When I updated the Logical Models I added extra asserts to ensure that we never create / use models which do not exist. This appears to have revealed an issue with this blif file.

The code that this is failing at is dead simple:

It just looks up a sub-circuit by the given name and tries to get information on that model. The code here assumes that the model must have already been created; as such its an assert within the logical models class.

The stack trace is telling us the name of the sub-circuit that is being looked up:
fourteennm_ram_block.opmode{quad_port}.output_type{comb}.port_a_address_width{10}.port_a_address2_width{10}.port_b_address_width{10}.port_b_address2_width{10}

From the name, I think this is a type of quad-port ram.

The reason this is crashing is because the stratix-10 architecture does not have quad-port RAMs (at least according to the architecture file). The issue, is that for some reason the matrix mult blif file is trying to use a quad_port RAM.

I used some Linux magic to parse all of the blif files in Titanium, and matrix_mult is the only one that is trying to use quad port memories:

@vaughnbetz I believe that this is an issue with the benchmark itself, not VTR. Is there ever a case where the blif file uses models which are not described in the architecture?

My theory about what happened here is that the old code may have just ignoring these models and was performing undefined behaviour; but honestly I have no idea why this blif file would ever work...

Just to cover my bases, I did re-download the Titanium benchmarks from scratch and confirmed that matrix_mult is using these quad-port models which do not exist in the S10 architecture.

@vaughnbetz What should we do about this?

kimiatkh · 2025-06-05T19:01:21Z

@AlexandreSinger When I was writing the Stratix 10 design, I didn't add support for the quad-port RAMs since they were barely used at the time. However, for the VTR 9 paper, I updated the architecture file to include support for quad-port RAMs as well. I'll try to create a pull request with the added support by next week.

vaughnbetz · 2025-06-05T23:48:15Z

Nice investigation Alex and thanks Kimia.

AlexandreSinger mentioned this issue Jun 4, 2025

[CI] Added Quick Titanium S10 Tests #3104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TestCoverage] Failing Titanium Benchmarks for S10 Arch #3105

[TestCoverage] Failing Titanium Benchmarks for S10 Arch #3105

AlexandreSinger commented Jun 4, 2025

AlexandreSinger commented Jun 4, 2025

Uh oh!

vaughnbetz commented Jun 4, 2025

Uh oh!

amin1377 commented Jun 4, 2025

Uh oh!

vaughnbetz commented Jun 4, 2025

Uh oh!

AlexandreSinger commented Jun 5, 2025

Uh oh!

kimiatkh commented Jun 5, 2025

Uh oh!

vaughnbetz commented Jun 5, 2025

Uh oh!

[TestCoverage] Failing Titanium Benchmarks for S10 Arch #3105

[TestCoverage] Failing Titanium Benchmarks for S10 Arch #3105

Comments

AlexandreSinger commented Jun 4, 2025

Initial Placement Issues

Circuit Loading Issues

AlexandreSinger commented Jun 4, 2025

Uh oh!

vaughnbetz commented Jun 4, 2025

Uh oh!

amin1377 commented Jun 4, 2025

Uh oh!

vaughnbetz commented Jun 4, 2025

Uh oh!

AlexandreSinger commented Jun 5, 2025

Uh oh!

kimiatkh commented Jun 5, 2025

Uh oh!

vaughnbetz commented Jun 5, 2025

Uh oh!