Skip to content

Accumulator instructions #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed

Conversation

alqotel
Copy link

@alqotel alqotel commented Mar 16, 2022

No description provided.

@alqotel alqotel self-assigned this Mar 16, 2022
@@ -133,6 +133,10 @@
%xx_xt 0:1 21:5
%xx_xb 1:1 11:5
%xx_xa 2:1 16:5

&X_a ra
@X_a ...... ra:3 -- ..... ..... .......... . &X_a
Copy link

@vcoracolombo vcoracolombo Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there shouldn't be any -- here. It should be in the instructions line later in the file

@X_a            ...... ra:3 .. ..... ..... .......... . &X_a
...
XXMFACC         011111 ...-- 00000 ----- 0010110001 -   @X_a

} \
}


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could remove this extra blank line here

Comment on lines 3348 to 3349
excp_ptr = &env->fp_status;
excp_flags = get_float_exception_flags(excp_ptr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to not use excp_ptr and just do get_float_exception_flags(&env->fp_status);

Comment on lines 3362 to 3367
if (unlikely(excp_flags & float_flag_inexact)) {
fpscr_flags |= FP_FX;
fpscr_flags |= FP_XX;
fpscr_flags |= (fpscr & FP_XE) ? FP_FEX : 0;
}
if (unlikely(excp_flags & float_flag_invalid)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mferst and I had a discussion yesterday on be2e904#r68851624 about not setting inexact when the operation (that case a conversion) is invalid. Do you think this is also the case here?

Also, I think the inexact case is not 'unlikely' as it tends to happen very frequently

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit different here. The description of these MMA instructions says that

For every multiply-add operation that is performed as part of the execution of this instruction, the operation is performed as if all exception enable bits are zero, and the trap-disabled result is returned. [...] Exception status is accumulated and the appropriate exception status bits in the FPSCR are updated at the completion of execution of the instruction.

IIUC, some multiply-add operation may set inexact and other may set invalid, and both bits would be set in FPSCR.

However, considering that softfpu accumulates float_exception_flags (see float_raise implementation in include/fpu/softfloat.h), I wonder if we need this method at all. Shouldn't it be enough to move the call to helper_reset_fpstatus outside the loop and call do_float_check_status at the end of the helper?

Copy link

@vcoracolombo vcoracolombo Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, considering that softfpu accumulates float_exception_flags (see float_raise implementation in include/fpu/softfloat.h), I wonder if we need this method at all. Shouldn't it be enough to move the call to helper_reset_fpstatus outside the loop and call do_float_check_status at the end of the helper?

I don't think calling do_float_check_status is enough because it doesn't deal with invalid exceptions now (only overflow/underflow and inexact).
But I think moving the call outside the loop can still be done creating a temporary float_status instead of a temporary fpscr. (might not even be necessary, would need to check if any of the float32_* relies on the status value being cleared from the previous iteration)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think calling do_float_check_status is enough because it doesn't deal with invalid exceptions now (only overflow/underflow and inexact).

Oh, right, we'd need to call something like float_invalid_op_madd too.

But I think moving the call outside the loop can still be done creating a temporary float_status instead of a temporary fpscr. (might not even be necessary, would need to check if any of the float32_* relies on the status value being cleared from the previous iteration)

AFAICT, the only place in softfpu that checks float_exception_flags is the hardfloat test can_use_fpu

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it also checked more, but having a quick look at the code it seems only float32_is_quiet_nan and float32_is_signaling_nan checks float_exception_flags, I'll move the check outside the loop

bool acc, neg_mul, neg_acc; \
xmsk = mask & 0x0F; \
ymsk = (mask >> 4) & 0x0F; \
ymax = MIN(4, 128/(sizeof(TARGET_T) * 8)); \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces around / (there's a lot of checkpatch issues on this patch, might want to run it and address them)

&XX2 xt xb uim:uint8_t
@XX2 ...... ..... ... uim:2 ..... ......... .. &XX2 xt=%xx_xt xb=%xx_xb

&XX3 xt xa xb
@XX3 ...... ..... ..... ..... ........ ... &XX3 xt=%xx_xt xa=%xx_xa xb=%xx_xb

%xx_at 23:3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align this line with the dots in the line below

Comment on lines 3430 to 3444
psum = float32_zero; \
va = !(pmsk & 0x2) ? float32_zero : \
GET_VSR(Vsr##OR_EL, a, \
2*i, ORIG_T, float32); \
vb = !(pmsk & 0x2) ? float32_zero : \
GET_VSR(Vsr##OR_EL, b, \
2*j, ORIG_T, float32); \
vc = !(pmsk & 0x1) ? float32_zero : \
GET_VSR(Vsr##OR_EL, a, \
2*i + 1, ORIG_T, float32); \
vd = !(pmsk & 0x1) ? float32_zero : \
GET_VSR(Vsr##OR_EL, b, \
2*j + 1, ORIG_T, float32); \
psum = float32_mul(va, vb, excp_ptr); \
psum = float32_muladd(vc, vd, psum,0, excp_ptr); \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like psum = float32_zero; has no effect here

} \
} \
env->fpscr |= fpscr; \
helper_fpscr_check_status(env); \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you shouldn't call helper_* from inside another helper (like here). @mferst explained it in another PR: #76 (comment)

I'm trying to tackle this situation in my exceptions rework in #76 but it's not really usable right now. As I see that patch is a mix between your compute_fp_flags() and helper_fpscr_check_status().
Some options could be wait/help to make my rework work here too;
or create a do_fpscr_check_status that receive a raddr (like do_float_check_status()), move helper_fpscr_check_status() logic to it and make helper_fpscr_check_status() call do_fpscr_check_status()

j++, ymsk_bit >>= 1) { \
if ((xmsk_bit & xmsk) && (ymsk_bit & ymsk)) { \
helper_reset_fpstatus(env); \
psum = TYPE##_zero; \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line could be removed too

Comment on lines 3509 to 3516
if (acc) { \
psum = TYPE##_muladd(va, vb, aux_acc, \
op_flags, excp_ptr); \
} else { \
psum = TYPE##_mul(va, vb, excp_ptr); \
} \
\
at->Vsr##EL(j) = psum; \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use a ternary operator you could even remove psum from this function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the objective is removing the psum you don't even need a ternary operator, psum is there mostly to imitate the ISA's pseudocode, but I'll change to a ternary

@alqotel alqotel force-pushed the alqotel_acc_final branch 3 times, most recently from 9eb65b2 to b7c0382 Compare March 25, 2022 12:24
Comment on lines 2222 to 2223
set_cpu_vsr(a->ra * 4 + i, tcg_constant_i64(0), false);
set_cpu_vsr(a->ra * 4 + i, tcg_constant_i64(0), true);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better if you save tcg_constant_i64(0) in a variable.

return true;
}

op_flag = op;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use op directly?

Comment on lines 2267 to 2891
TRANS(XVI4GER8 , do_ger_XX3, GER_NOP, gen_helper_XVI4GER8)
TRANS(XVI4GER8PP , do_ger_XX3, GER_PP , gen_helper_XVI4GER8)
TRANS(XVI8GER4 , do_ger_XX3, GER_NOP, gen_helper_XVI8GER4)
TRANS(XVI8GER4PP , do_ger_XX3, GER_PP , gen_helper_XVI8GER4)
TRANS(XVI8GER4SPP , do_ger_XX3, GER_SPP, gen_helper_XVI8GER4)
TRANS(XVI16GER2 , do_ger_XX3, GER_NOP, gen_helper_XVI16GER2)
TRANS(XVI16GER2PP , do_ger_XX3, GER_PP , gen_helper_XVI16GER2)
TRANS(XVI16GER2S , do_ger_XX3, GER_SAT, gen_helper_XVI16GER2)
TRANS(XVI16GER2SPP, do_ger_XX3, GER_SPP, gen_helper_XVI16GER2)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TRANS(XVI4GER8 , do_ger_XX3, GER_NOP, gen_helper_XVI4GER8)
TRANS(XVI4GER8PP , do_ger_XX3, GER_PP , gen_helper_XVI4GER8)
TRANS(XVI8GER4 , do_ger_XX3, GER_NOP, gen_helper_XVI8GER4)
TRANS(XVI8GER4PP , do_ger_XX3, GER_PP , gen_helper_XVI8GER4)
TRANS(XVI8GER4SPP , do_ger_XX3, GER_SPP, gen_helper_XVI8GER4)
TRANS(XVI16GER2 , do_ger_XX3, GER_NOP, gen_helper_XVI16GER2)
TRANS(XVI16GER2PP , do_ger_XX3, GER_PP , gen_helper_XVI16GER2)
TRANS(XVI16GER2S , do_ger_XX3, GER_SAT, gen_helper_XVI16GER2)
TRANS(XVI16GER2SPP, do_ger_XX3, GER_SPP, gen_helper_XVI16GER2)
TRANS(XVI4GER8, do_ger_XX3, GER_NOP, gen_helper_XVI4GER8)
TRANS(XVI4GER8PP, do_ger_XX3, GER_PP, gen_helper_XVI4GER8)
TRANS(XVI8GER4, do_ger_XX3, GER_NOP, gen_helper_XVI8GER4)
TRANS(XVI8GER4PP, do_ger_XX3, GER_PP, gen_helper_XVI8GER4)
TRANS(XVI8GER4SPP, do_ger_XX3, GER_SPP, gen_helper_XVI8GER4)
TRANS(XVI16GER2, do_ger_XX3, GER_NOP, gen_helper_XVI16GER2)
TRANS(XVI16GER2PP, do_ger_XX3, GER_PP, gen_helper_XVI16GER2)
TRANS(XVI16GER2S, do_ger_XX3, GER_SAT, gen_helper_XVI16GER2)
TRANS(XVI16GER2SPP, do_ger_XX3, GER_SPP, gen_helper_XVI16GER2)

Comment on lines 2291 to 2901
TRANS64(PMXVI4GER8 , do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI4GER8)
TRANS64(PMXVI4GER8PP , do_ger_MMIRR_XX3, GER_PP , gen_helper_XVI4GER8)
TRANS64(PMXVI8GER4 , do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI8GER4)
TRANS64(PMXVI8GER4PP , do_ger_MMIRR_XX3, GER_PP , gen_helper_XVI8GER4)
TRANS64(PMXVI8GER4SPP , do_ger_MMIRR_XX3, GER_SPP, gen_helper_XVI8GER4)
TRANS64(PMXVI16GER2 , do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI16GER2)
TRANS64(PMXVI16GER2PP , do_ger_MMIRR_XX3, GER_PP , gen_helper_XVI16GER2)
TRANS64(PMXVI16GER2S , do_ger_MMIRR_XX3, GER_SAT, gen_helper_XVI16GER2)
TRANS64(PMXVI16GER2SPP, do_ger_MMIRR_XX3, GER_SPP, gen_helper_XVI16GER2)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TRANS64(PMXVI4GER8 , do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI4GER8)
TRANS64(PMXVI4GER8PP , do_ger_MMIRR_XX3, GER_PP , gen_helper_XVI4GER8)
TRANS64(PMXVI8GER4 , do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI8GER4)
TRANS64(PMXVI8GER4PP , do_ger_MMIRR_XX3, GER_PP , gen_helper_XVI8GER4)
TRANS64(PMXVI8GER4SPP , do_ger_MMIRR_XX3, GER_SPP, gen_helper_XVI8GER4)
TRANS64(PMXVI16GER2 , do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI16GER2)
TRANS64(PMXVI16GER2PP , do_ger_MMIRR_XX3, GER_PP , gen_helper_XVI16GER2)
TRANS64(PMXVI16GER2S , do_ger_MMIRR_XX3, GER_SAT, gen_helper_XVI16GER2)
TRANS64(PMXVI16GER2SPP, do_ger_MMIRR_XX3, GER_SPP, gen_helper_XVI16GER2)
TRANS64(PMXVI4GER8, do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI4GER8)
TRANS64(PMXVI4GER8PP, do_ger_MMIRR_XX3, GER_PP, gen_helper_XVI4GER8)
TRANS64(PMXVI8GER4, do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI8GER4)
TRANS64(PMXVI8GER4PP, do_ger_MMIRR_XX3, GER_PP, gen_helper_XVI8GER4)
TRANS64(PMXVI8GER4SPP, do_ger_MMIRR_XX3, GER_SPP, gen_helper_XVI8GER4)
TRANS64(PMXVI16GER2, do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI16GER2)
TRANS64(PMXVI16GER2PP, do_ger_MMIRR_XX3, GER_PP, gen_helper_XVI16GER2)
TRANS64(PMXVI16GER2S, do_ger_MMIRR_XX3, GER_SAT, gen_helper_XVI16GER2)
TRANS64(PMXVI16GER2SPP, do_ger_MMIRR_XX3, GER_SPP, gen_helper_XVI16GER2)

Comment on lines 3343 to 3440
/*
* This function should be changed before sending the patches,
* as it's here as a placeholder since the functions that set
* FPSCR also raise an exception.
*/
static inline void compute_fp_flags(CPUPPCState *env, uintptr_t retaddr)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment needs an update? This method is also raising the exception at the end.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe It'd be good to rewrite it to make it clearer, but the problem is that other functions would raise an exception before setting all FPSCR bits (and possibly wrongfully unsetting FR/FI), so this function seeks to avoid that

@@ -143,8 +143,9 @@
&XX3 xt xa xb
@XX3 ...... ..... ..... ..... ........ ... &XX3 xt=%xx_xt xa=%xx_xa xb=%xx_xb

%xx_at 23:3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix belong to the second commit ("target/ppc: Implemented xviger instructions")

Comment on lines 2325 to 2341
TRANS(XVF16GER2, do_ger_XX3, GER_NOP, gen_helper_XVF16GER2)
TRANS(XVF16GER2PP, do_ger_XX3, GER_PP , gen_helper_XVF16GER2)
TRANS(XVF16GER2PN, do_ger_XX3, GER_PN , gen_helper_XVF16GER2)
TRANS(XVF16GER2NP, do_ger_XX3, GER_NP , gen_helper_XVF16GER2)
TRANS(XVF16GER2NN, do_ger_XX3, GER_NN , gen_helper_XVF16GER2)

TRANS(XVF32GER, do_ger_XX3, GER_NOP, gen_helper_XVF32GER)
TRANS(XVF32GERPP, do_ger_XX3, GER_PP , gen_helper_XVF32GER)
TRANS(XVF32GERPN, do_ger_XX3, GER_PN , gen_helper_XVF32GER)
TRANS(XVF32GERNP, do_ger_XX3, GER_NP , gen_helper_XVF32GER)
TRANS(XVF32GERNN, do_ger_XX3, GER_NN , gen_helper_XVF32GER)

TRANS(XVF64GER, do_ger_XX3, GER_NOP, gen_helper_XVF64GER)
TRANS(XVF64GERPP, do_ger_XX3, GER_PP , gen_helper_XVF64GER)
TRANS(XVF64GERPN, do_ger_XX3, GER_PN , gen_helper_XVF64GER)
TRANS(XVF64GERNP, do_ger_XX3, GER_NP , gen_helper_XVF64GER)
TRANS(XVF64GERNN, do_ger_XX3, GER_NN , gen_helper_XVF64GER)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TRANS(XVF16GER2, do_ger_XX3, GER_NOP, gen_helper_XVF16GER2)
TRANS(XVF16GER2PP, do_ger_XX3, GER_PP , gen_helper_XVF16GER2)
TRANS(XVF16GER2PN, do_ger_XX3, GER_PN , gen_helper_XVF16GER2)
TRANS(XVF16GER2NP, do_ger_XX3, GER_NP , gen_helper_XVF16GER2)
TRANS(XVF16GER2NN, do_ger_XX3, GER_NN , gen_helper_XVF16GER2)
TRANS(XVF32GER, do_ger_XX3, GER_NOP, gen_helper_XVF32GER)
TRANS(XVF32GERPP, do_ger_XX3, GER_PP , gen_helper_XVF32GER)
TRANS(XVF32GERPN, do_ger_XX3, GER_PN , gen_helper_XVF32GER)
TRANS(XVF32GERNP, do_ger_XX3, GER_NP , gen_helper_XVF32GER)
TRANS(XVF32GERNN, do_ger_XX3, GER_NN , gen_helper_XVF32GER)
TRANS(XVF64GER, do_ger_XX3, GER_NOP, gen_helper_XVF64GER)
TRANS(XVF64GERPP, do_ger_XX3, GER_PP , gen_helper_XVF64GER)
TRANS(XVF64GERPN, do_ger_XX3, GER_PN , gen_helper_XVF64GER)
TRANS(XVF64GERNP, do_ger_XX3, GER_NP , gen_helper_XVF64GER)
TRANS(XVF64GERNN, do_ger_XX3, GER_NN , gen_helper_XVF64GER)
TRANS(XVF16GER2, do_ger_XX3, GER_NOP, gen_helper_XVF16GER2)
TRANS(XVF16GER2PP, do_ger_XX3, GER_PP, gen_helper_XVF16GER2)
TRANS(XVF16GER2PN, do_ger_XX3, GER_PN, gen_helper_XVF16GER2)
TRANS(XVF16GER2NP, do_ger_XX3, GER_NP, gen_helper_XVF16GER2)
TRANS(XVF16GER2NN, do_ger_XX3, GER_NN, gen_helper_XVF16GER2)
TRANS(XVF32GER, do_ger_XX3, GER_NOP, gen_helper_XVF32GER)
TRANS(XVF32GERPP, do_ger_XX3, GER_PP, gen_helper_XVF32GER)
TRANS(XVF32GERPN, do_ger_XX3, GER_PN, gen_helper_XVF32GER)
TRANS(XVF32GERNP, do_ger_XX3, GER_NP, gen_helper_XVF32GER)
TRANS(XVF32GERNN, do_ger_XX3, GER_NN, gen_helper_XVF32GER)
TRANS(XVF64GER, do_ger_XX3, GER_NOP, gen_helper_XVF64GER)
TRANS(XVF64GERPP, do_ger_XX3, GER_PP, gen_helper_XVF64GER)
TRANS(XVF64GERPN, do_ger_XX3, GER_PN, gen_helper_XVF64GER)
TRANS(XVF64GERNP, do_ger_XX3, GER_NP, gen_helper_XVF64GER)
TRANS(XVF64GERNN, do_ger_XX3, GER_NN, gen_helper_XVF64GER)

Comment on lines 2359 to 2375
TRANS64(PMXVF16GER2, do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVF16GER2)
TRANS64(PMXVF16GER2PP, do_ger_MMIRR_XX3, GER_PP , gen_helper_XVF16GER2)
TRANS64(PMXVF16GER2PN, do_ger_MMIRR_XX3, GER_PN , gen_helper_XVF16GER2)
TRANS64(PMXVF16GER2NP, do_ger_MMIRR_XX3, GER_NP , gen_helper_XVF16GER2)
TRANS64(PMXVF16GER2NN, do_ger_MMIRR_XX3, GER_NN , gen_helper_XVF16GER2)

TRANS64(PMXVF32GER, do_ger_MMIRR_XX3_NO_PMSK, GER_NOP, gen_helper_XVF32GER)
TRANS64(PMXVF32GERPP, do_ger_MMIRR_XX3_NO_PMSK, GER_PP , gen_helper_XVF32GER)
TRANS64(PMXVF32GERPN, do_ger_MMIRR_XX3_NO_PMSK, GER_PN , gen_helper_XVF32GER)
TRANS64(PMXVF32GERNP, do_ger_MMIRR_XX3_NO_PMSK, GER_NP , gen_helper_XVF32GER)
TRANS64(PMXVF32GERNN, do_ger_MMIRR_XX3_NO_PMSK, GER_NN , gen_helper_XVF32GER)

TRANS64(PMXVF64GER, do_ger_MMIRR_XX3_NO_PMSK, GER_NOP, gen_helper_XVF64GER)
TRANS64(PMXVF64GERPP, do_ger_MMIRR_XX3_NO_PMSK, GER_PP , gen_helper_XVF64GER)
TRANS64(PMXVF64GERPN, do_ger_MMIRR_XX3_NO_PMSK, GER_PN , gen_helper_XVF64GER)
TRANS64(PMXVF64GERNP, do_ger_MMIRR_XX3_NO_PMSK, GER_NP , gen_helper_XVF64GER)
TRANS64(PMXVF64GERNN, do_ger_MMIRR_XX3_NO_PMSK, GER_NN , gen_helper_XVF64GER)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TRANS64(PMXVF16GER2, do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVF16GER2)
TRANS64(PMXVF16GER2PP, do_ger_MMIRR_XX3, GER_PP , gen_helper_XVF16GER2)
TRANS64(PMXVF16GER2PN, do_ger_MMIRR_XX3, GER_PN , gen_helper_XVF16GER2)
TRANS64(PMXVF16GER2NP, do_ger_MMIRR_XX3, GER_NP , gen_helper_XVF16GER2)
TRANS64(PMXVF16GER2NN, do_ger_MMIRR_XX3, GER_NN , gen_helper_XVF16GER2)
TRANS64(PMXVF32GER, do_ger_MMIRR_XX3_NO_PMSK, GER_NOP, gen_helper_XVF32GER)
TRANS64(PMXVF32GERPP, do_ger_MMIRR_XX3_NO_PMSK, GER_PP , gen_helper_XVF32GER)
TRANS64(PMXVF32GERPN, do_ger_MMIRR_XX3_NO_PMSK, GER_PN , gen_helper_XVF32GER)
TRANS64(PMXVF32GERNP, do_ger_MMIRR_XX3_NO_PMSK, GER_NP , gen_helper_XVF32GER)
TRANS64(PMXVF32GERNN, do_ger_MMIRR_XX3_NO_PMSK, GER_NN , gen_helper_XVF32GER)
TRANS64(PMXVF64GER, do_ger_MMIRR_XX3_NO_PMSK, GER_NOP, gen_helper_XVF64GER)
TRANS64(PMXVF64GERPP, do_ger_MMIRR_XX3_NO_PMSK, GER_PP , gen_helper_XVF64GER)
TRANS64(PMXVF64GERPN, do_ger_MMIRR_XX3_NO_PMSK, GER_PN , gen_helper_XVF64GER)
TRANS64(PMXVF64GERNP, do_ger_MMIRR_XX3_NO_PMSK, GER_NP , gen_helper_XVF64GER)
TRANS64(PMXVF64GERNN, do_ger_MMIRR_XX3_NO_PMSK, GER_NN , gen_helper_XVF64GER)
TRANS64(PMXVF16GER2, do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVF16GER2)
TRANS64(PMXVF16GER2PP, do_ger_MMIRR_XX3, GER_PP, gen_helper_XVF16GER2)
TRANS64(PMXVF16GER2PN, do_ger_MMIRR_XX3, GER_PN, gen_helper_XVF16GER2)
TRANS64(PMXVF16GER2NP, do_ger_MMIRR_XX3, GER_NP, gen_helper_XVF16GER2)
TRANS64(PMXVF16GER2NN, do_ger_MMIRR_XX3, GER_NN, gen_helper_XVF16GER2)
TRANS64(PMXVF32GER, do_ger_MMIRR_XX3_NO_PMSK, GER_NOP, gen_helper_XVF32GER)
TRANS64(PMXVF32GERPP, do_ger_MMIRR_XX3_NO_PMSK, GER_PP, gen_helper_XVF32GER)
TRANS64(PMXVF32GERPN, do_ger_MMIRR_XX3_NO_PMSK, GER_PN, gen_helper_XVF32GER)
TRANS64(PMXVF32GERNP, do_ger_MMIRR_XX3_NO_PMSK, GER_NP, gen_helper_XVF32GER)
TRANS64(PMXVF32GERNN, do_ger_MMIRR_XX3_NO_PMSK, GER_NN, gen_helper_XVF32GER)
TRANS64(PMXVF64GER, do_ger_MMIRR_XX3_NO_PMSK, GER_NOP, gen_helper_XVF64GER)
TRANS64(PMXVF64GERPP, do_ger_MMIRR_XX3_NO_PMSK, GER_PP, gen_helper_XVF64GER)
TRANS64(PMXVF64GERPN, do_ger_MMIRR_XX3_NO_PMSK, GER_PN, gen_helper_XVF64GER)
TRANS64(PMXVF64GERNP, do_ger_MMIRR_XX3_NO_PMSK, GER_NP, gen_helper_XVF64GER)
TRANS64(PMXVF64GERNN, do_ger_MMIRR_XX3_NO_PMSK, GER_NN, gen_helper_XVF64GER)

@alqotel alqotel force-pushed the alqotel_acc_final branch 4 times, most recently from 44bd674 to 0b52213 Compare April 4, 2022 15:09
@@ -155,6 +155,9 @@
&XX2 xt xb
@XX2 ...... ..... ..... ..... ......... .. &XX2 xt=%xx_xt xb=%xx_xb

&X_a ra
@X_a ...... ra:3 .. ..... ..... .......... . &X_a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra space before &X_a


## Vector GER instruction

XVI4GER8 111011 ... -- ..... ..... 00100011 ..- @XX3_at xa=%xx_xa

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I didn't know this xa=%xx_xa worked here


mask = ger_pack_masks(a->pmsk, a->ymsk, a->xmsk);
helper(cpu_env, tcg_constant_i32(a->xa), tcg_constant_i32(a->xb),
tcg_constant_i32(a->xt * 4), tcg_constant_i32(mask),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this, but would this be the case to use times_4 from translate.c? Does it even work with prefixed instructions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the new version has times_4 for comparison

@alqotel alqotel force-pushed the alqotel_acc_final branch 4 times, most recently from 56b8efe to 87df5e7 Compare May 11, 2022 17:59
@alqotel alqotel force-pushed the alqotel_acc_final branch from 87df5e7 to 2986290 Compare May 17, 2022 11:57
vcoracolombo and others added 11 commits May 17, 2022 13:46
According to Power ISA, the FI bit in FPSCR is non-sticky.
This means that if an instruction is said to modify the FI bit, then
it should be set or cleared depending on the result of the
instruction. Otherwise, it should be kept as was before.

However, the following inconsistency was found when comparing results
from the hardware (tested on both a Power 9 processor and in
Power 10 Mambo):

(FI bit is set before the execution of the instruction)
Hardware: xscmpeqdp(0xff..ff, 0xff..ff) = FI: SET -> SET
QEMU: xscmpeqdp(0xff..ff, 0xff..ff) = FI: SET -> CLEARED

As the FI bit is non-sticky, and xscmpeqdp does not list it as a field
that is changed by the instruction, it should not be changed after its
execution.
This is happening to multiple instructions in the vsx implementations.

If the ISA does not list the FI bit as altered for a particular
instruction, then it should be kept as it was before the instruction.

QEMU is not following this behavior. Affected instructions include:
- xv* (all vsx-vector instructions);
- xscmp*, xsmax*, xsmin*;
- xstdivdp and similars;
(to identify the affected instructions, just search in the ISA for
 the instructions that does not list FI in "Special Registers Altered")

Most instructions use the function do_float_check_status() to commit
changes in the inexact flag. So the fix is to add a parameter to it
that will control if the bit FI should be changed or not.
All users of do_float_check_status() are then modified to provide this
argument, controlling if that specific instruction changes bit FI or
not.
Some macro helpers are responsible for both instructions that change
and instructions that aren't suposed to change FI. This seems to always
overlap with the sfprf flag. So, reuse this flag for this purpose when
applicable.

Signed-off-by: Víctor Colombo <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
This patch fixes another not-so-clear situation in Power ISA
regarding the inexact bits in FPSCR. The ISA states that:

"""
When Overflow Exception is disabled (OE=0) and an
Overflow Exception occurs, the following actions are
taken:
...
2. Inexact Exception is set
XX <- 1
...
FI is set to 1
...
"""

However, when tested on a Power 9 hardware, some instructions that
trigger an OX don't set the FI bit:

xvcvdpsp(0x4050533fcdb7b95ff8d561c40bf90996) = FI: CLEARED -> CLEARED
xvnmsubmsp(0xf3c0c1fc8f3230, 0xbeaab9c5) = FI: CLEARED -> CLEARED
(just a few examples. Other instructions are also affected)

The root cause for this seems to be that only instructions that list
the bit FI in the "Special Registers Altered" should modify it.

QEMU is, today, not working like the hardware:

xvcvdpsp(0x4050533fcdb7b95ff8d561c40bf90996) = FI: CLEARED -> SET
xvnmsubmsp(0xf3c0c1fc8f3230, 0xbeaab9c5) = FI: CLEARED -> SET

(all tests assume FI is cleared beforehand)

Fix this by making float_overflow_excp() return float_flag_inexact
if it should update the inexact flags.

Signed-off-by: Víctor Colombo <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Rashmica Gupta <[email protected]>
The bit FI fix used the sfprf flag as a flag for the set_fi parameter
in do_float_check_status where applicable. Now, this patch rename this
flag to sfifprf to state this dual usage.

Signed-off-by: Víctor Colombo <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Rashmica Gupta <[email protected]>
Implement the following PowerISA v3.1 instructions:
xxmfacc: VSX Move From Accumulator
xxmtacc: VSX Move To Accumulator
xxsetaccz: VSX Set Accumulator to Zero

The PowerISA 3.1 mentions that for the current version of the
architecture, "the hardware implementation provides the effect of ACC[i]
and VSRs 4*i to 4*i + 3 logically containing the same data" and "The
Accumulators introduce no new logical state at this time" (page 501).
For now it seems unnecessary to create new structures, so this patch
just uses ACC[i] as VSRs 4*i to 4*i+3 and therefore move to and from
accumulators are no-ops.

Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Implement the following PowerISA v3.1 instructions:
xvi4ger8:     VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update)
xvi4ger8pp:   VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update)
Positive multiply, Positive accumulate
xvi8ger4:     VSX Vector 4-bit Signed Integer GER (rank-8 update)
xvi8ger4pp:   VSX Vector 4-bit Signed Integer GER (rank-8 update)
Positive multiply, Positive accumulate
xvi8ger4spp:  VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update)
with Saturate Positive multiply, Positive accumulate
xvi16ger2:    VSX Vector 16-bit Signed Integer GER (rank-2 update)
xvi16ger2pp:  VSX Vector 16-bit Signed Integer GER (rank-2 update)
Positive multiply, Positive accumulate
xvi16ger2s:   VSX Vector 16-bit Signed Integer GER (rank-2 update)
with Saturation
xvi16ger2spp: VSX Vector 16-bit Signed Integer GER (rank-2 update)
with Saturation Positive multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Implement the following PowerISA v3.1 instructions:
pmxvi4ger8:     Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer
GER (rank-4 update)
pmxvi4ger8pp:   Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer
GER (rank-4 update) Positive multiply, Positive accumulate
pmxvi8ger4:     Prefixed Masked VSX Vector 4-bit Signed Integer GER
(rank-8 update)
pmxvi8ger4pp:   Prefixed Masked VSX Vector 4-bit Signed Integer GER
(rank-8 update) Positive multiply, Positive accumulate
pmxvi8ger4spp:  Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer
GER (rank-4 update) with Saturate Positive multiply, Positive accumulate
pmxvi16ger2:    Prefixed Masked VSX Vector 16-bit Signed Integer GER
(rank-2 update)
pmxvi16ger2pp:  Prefixed Masked VSX Vector 16-bit Signed Integer GER
(rank-2 update) Positive multiply, Positive accumulate
pmxvi16ger2s:   Prefixed Masked VSX Vector 16-bit Signed Integer GER
(rank-2 update) with Saturation
pmxvi16ger2spp: Prefixed Masked VSX Vector 16-bit Signed Integer GER
(rank-2 update) with Saturation Positive multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Implement the following PowerISA v3.1 instructions:
xvf32ger:   VSX Vector 32-bit Floating-Point GER (rank-1 update)
xvf32gernn: VSX Vector 32-bit Floating-Point GER (rank-1 update) Negative
multiply, Negative accumulate
xvf32gernp: VSX Vector 32-bit Floating-Point GER (rank-1 update) Negative
multiply, Positive accumulate
xvf32gerpn: VSX Vector 32-bit Floating-Point GER (rank-1 update) Positive
multiply, Negative accumulate
xvf32gerpp: VSX Vector 32-bit Floating-Point GER (rank-1 update) Positive
multiply, Positive accumulate
xvf64ger:   VSX Vector 64-bit Floating-Point GER (rank-1 update)
xvf64gernn: VSX Vector 64-bit Floating-Point GER (rank-1 update) Negative
multiply, Negative accumulate
xvf64gernp: VSX Vector 64-bit Floating-Point GER (rank-1 update) Negative
multiply, Positive accumulate
xvf64gerpn: VSX Vector 64-bit Floating-Point GER (rank-1 update) Positive
multiply, Negative accumulate
xvf64gerpp: VSX Vector 64-bit Floating-Point GER (rank-1 update) Positive
multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Implement the following PowerISA v3.1 instructions:
xvf16ger2:   VSX Vector 16-bit Floating-Point GER (rank-2 update)
xvf16ger2nn: VSX Vector 16-bit Floating-Point GER (rank-2 update) Negative
multiply, Negative accumulate
xvf16ger2np: VSX Vector 16-bit Floating-Point GER (rank-2 update) Negative
multiply, Positive accumulate
xvf16ger2pn: VSX Vector 16-bit Floating-Point GER (rank-2 update) Positive
multiply, Negative accumulate
xvf16ger2pp: VSX Vector 16-bit Floating-Point GER (rank-2 update) Positive
multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]>
Implement the following PowerISA v3.1 instructions:
pmxvf16ger2:   Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update)
pmxvf16ger2nn: Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update) Negative multiply, Negative accumulate
pmxvf16ger2np: Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update) Negative multiply, Positive accumulate
pmxvf16ger2pn: Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update) Positive multiply, Negative accumulate
pmxvf16ger2pp: Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update) Positive multiply, Positive accumulate
pmxvf32ger:    Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update)
pmxvf32gernn:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update) Negative multiply, Negative accumulate
pmxvf32gernp:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update) Negative multiply, Positive accumulate
pmxvf32gerpn:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update) Positive multiply, Negative accumulate
pmxvf32gerpp:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update) Positive multiply, Positive accumulate
pmxvf64ger:    Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update)
pmxvf64gernn:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update) Negative multiply, Negative accumulate
pmxvf64gernp:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update) Negative multiply, Positive accumulate
pmxvf64gerpn:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update) Positive multiply, Negative accumulate
pmxvf64gerpp:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update) Positive multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Lucas Mateus Castro (alqotel) and others added 3 commits May 18, 2022 09:09
Implement the following PowerISA v3.1 instructions:
xvbf16ger2:   VSX Vector bfloat16 GER (rank-2 update)
xvbf16ger2nn: VSX Vector bfloat16 GER (rank-2 update) Negative multiply,
Negative accumulate
xvbf16ger2np: VSX Vector bfloat16 GER (rank-2 update) Negative multiply,
Positive accumulate
xvbf16ger2pn: VSX Vector bfloat16 GER (rank-2 update) Positive multiply,
Negative accumulate
xvbf16ger2pp: VSX Vector bfloat16 GER (rank-2 update) Positive multiply,
Positive accumulate
pmxvbf16ger2:   Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
pmxvbf16ger2nn: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
Negative multiply, Negative accumulate
pmxvbf16ger2np: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
Negative multiply, Positive accumulate
pmxvbf16ger2pn: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
Positive multiply, Negative accumulate
pmxvbf16ger2pp: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
Positive multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
These are new hwcap bits added for power10.

Signed-off-by: Joel Stanley <[email protected]>
Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
@alqotel alqotel force-pushed the alqotel_vsx_next branch from 36f0e06 to cb7833f Compare May 18, 2022 18:12
@alqotel alqotel force-pushed the alqotel_acc_final branch from 2986290 to 6f23817 Compare May 18, 2022 18:13
@alqotel alqotel closed this Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants