Accumulator instructions #77

alqotel · 2022-03-16T18:44:19Z

No description provided.

vcoracolombo · 2022-03-16T19:18:00Z

target/ppc/insn32.decode

@@ -133,6 +133,10 @@
 %xx_xt          0:1 21:5
 %xx_xb          1:1 11:5
 %xx_xa          2:1 16:5
+
+&X_a            ra
+@X_a            ...... ra:3 -- ..... ..... .......... .          &X_a


I think there shouldn't be any -- here. It should be in the instructions line later in the file

@X_a ...... ra:3 .. ..... ..... .......... . &X_a ... XXMFACC 011111 ...-- 00000 ----- 0010110001 - @X_a

vcoracolombo · 2022-03-16T19:47:15Z

target/ppc/int_helper.c

+        }                                                               \
+    }
+
+


Could remove this extra blank line here

vcoracolombo · 2022-03-17T13:21:23Z

target/ppc/fpu_helper.c

+    excp_ptr = &env->fp_status;
+    excp_flags = get_float_exception_flags(excp_ptr);


I would suggest to not use excp_ptr and just do get_float_exception_flags(&env->fp_status);

vcoracolombo · 2022-03-17T13:34:31Z

target/ppc/fpu_helper.c

+    if (unlikely(excp_flags & float_flag_inexact)) {
+        fpscr_flags |= FP_FX;
+        fpscr_flags |= FP_XX;
+        fpscr_flags |= (fpscr & FP_XE) ? FP_FEX : 0;
+    }
+    if (unlikely(excp_flags & float_flag_invalid)) {


@mferst and I had a discussion yesterday on be2e904#r68851624 about not setting inexact when the operation (that case a conversion) is invalid. Do you think this is also the case here?

Also, I think the inexact case is not 'unlikely' as it tends to happen very frequently

It's a bit different here. The description of these MMA instructions says that

For every multiply-add operation that is performed as part of the execution of this instruction, the operation is performed as if all exception enable bits are zero, and the trap-disabled result is returned. [...] Exception status is accumulated and the appropriate exception status bits in the FPSCR are updated at the completion of execution of the instruction.

IIUC, some multiply-add operation may set inexact and other may set invalid, and both bits would be set in FPSCR.

However, considering that softfpu accumulates float_exception_flags (see float_raise implementation in include/fpu/softfloat.h), I wonder if we need this method at all. Shouldn't it be enough to move the call to helper_reset_fpstatus outside the loop and call do_float_check_status at the end of the helper?

However, considering that softfpu accumulates float_exception_flags (see float_raise implementation in include/fpu/softfloat.h), I wonder if we need this method at all. Shouldn't it be enough to move the call to helper_reset_fpstatus outside the loop and call do_float_check_status at the end of the helper?

I don't think calling do_float_check_status is enough because it doesn't deal with invalid exceptions now (only overflow/underflow and inexact).
But I think moving the call outside the loop can still be done creating a temporary float_status instead of a temporary fpscr. (might not even be necessary, would need to check if any of the float32_* relies on the status value being cleared from the previous iteration)

I don't think calling do_float_check_status is enough because it doesn't deal with invalid exceptions now (only overflow/underflow and inexact).

Oh, right, we'd need to call something like float_invalid_op_madd too.

But I think moving the call outside the loop can still be done creating a temporary float_status instead of a temporary fpscr. (might not even be necessary, would need to check if any of the float32_* relies on the status value being cleared from the previous iteration)

AFAICT, the only place in softfpu that checks float_exception_flags is the hardfloat test can_use_fpu

I thought it also checked more, but having a quick look at the code it seems only float32_is_quiet_nan and float32_is_signaling_nan checks float_exception_flags, I'll move the check outside the loop

vcoracolombo · 2022-03-17T13:56:46Z

target/ppc/fpu_helper.c

+        bool acc, neg_mul, neg_acc;                                     \
+        xmsk = mask & 0x0F;                                             \
+        ymsk = (mask >> 4) & 0x0F;                                      \
+        ymax = MIN(4, 128/(sizeof(TARGET_T) * 8));                      \


Spaces around / (there's a lot of checkpatch issues on this patch, might want to run it and address them)

vcoracolombo · 2022-03-17T15:28:49Z

target/ppc/insn32.decode

 &XX2            xt xb uim:uint8_t
 @XX2            ...... ..... ... uim:2 ..... ......... ..       &XX2 xt=%xx_xt xb=%xx_xb

 &XX3            xt xa xb
 @XX3            ...... ..... ..... ..... ........ ...           &XX3 xt=%xx_xt xa=%xx_xa xb=%xx_xb

+%xx_at         23:3


align this line with the dots in the line below

vcoracolombo · 2022-03-17T15:53:21Z

target/ppc/fpu_helper.c

+                    psum = float32_zero;                                \
+                    va = !(pmsk & 0x2) ? float32_zero :                 \
+                                         GET_VSR(Vsr##OR_EL, a,         \
+                                         2*i, ORIG_T, float32);         \
+                    vb = !(pmsk & 0x2) ? float32_zero :                 \
+                                         GET_VSR(Vsr##OR_EL, b,         \
+                                         2*j, ORIG_T, float32);         \
+                    vc = !(pmsk & 0x1) ? float32_zero :                 \
+                                         GET_VSR(Vsr##OR_EL, a,         \
+                                         2*i + 1, ORIG_T, float32);     \
+                    vd = !(pmsk & 0x1) ? float32_zero :                 \
+                                         GET_VSR(Vsr##OR_EL, b,         \
+                                         2*j + 1, ORIG_T, float32);     \
+                    psum = float32_mul(va, vb, excp_ptr);               \
+                    psum = float32_muladd(vc, vd, psum,0, excp_ptr);    \


Looks like psum = float32_zero; has no effect here

vcoracolombo · 2022-03-17T16:34:06Z

target/ppc/fpu_helper.c

+            }                                                           \
+        }                                                               \
+        env->fpscr |= fpscr;                                            \
+        helper_fpscr_check_status(env);                                 \


I think you shouldn't call helper_* from inside another helper (like here). @mferst explained it in another PR: #76 (comment)

I'm trying to tackle this situation in my exceptions rework in #76 but it's not really usable right now. As I see that patch is a mix between your compute_fp_flags() and helper_fpscr_check_status().
Some options could be wait/help to make my rework work here too;
or create a do_fpscr_check_status that receive a raddr (like do_float_check_status()), move helper_fpscr_check_status() logic to it and make helper_fpscr_check_status() call do_fpscr_check_status()

vcoracolombo · 2022-03-17T16:59:49Z

target/ppc/fpu_helper.c

+                 j++, ymsk_bit >>= 1) {                                 \
+                if ((xmsk_bit & xmsk) && (ymsk_bit & ymsk)) {           \
+                    helper_reset_fpstatus(env);                         \
+                    psum = TYPE##_zero;                                 \


I think this line could be removed too

vcoracolombo · 2022-03-17T17:00:54Z

target/ppc/fpu_helper.c

+                    if (acc) {                                          \
+                        psum = TYPE##_muladd(va, vb, aux_acc,           \
+                                             op_flags, excp_ptr);       \
+                    } else {                                            \
+                        psum = TYPE##_mul(va, vb, excp_ptr);            \
+                    }                                                   \
+                                                                        \
+                    at->Vsr##EL(j) = psum;                              \


If you use a ternary operator you could even remove psum from this function

If the objective is removing the psum you don't even need a ternary operator, psum is there mostly to imitate the ISA's pseudocode, but I'll change to a ternary

target/ppc/translate/vsx-impl.c.inc

target/ppc/int_helper.c

mferst · 2022-03-25T18:10:11Z

target/ppc/translate/vsx-impl.c.inc

+        set_cpu_vsr(a->ra * 4 + i, tcg_constant_i64(0), false);
+        set_cpu_vsr(a->ra * 4 + i, tcg_constant_i64(0), true);


It might be better if you save tcg_constant_i64(0) in a variable.

mferst · 2022-03-25T18:20:36Z

target/ppc/translate/vsx-impl.c.inc

+        return true;
+    }
+
+    op_flag = op;


Why not use op directly?

mferst · 2022-03-25T18:21:39Z

target/ppc/translate/vsx-impl.c.inc

+TRANS(XVI4GER8    , do_ger_XX3, GER_NOP, gen_helper_XVI4GER8)
+TRANS(XVI4GER8PP  , do_ger_XX3, GER_PP , gen_helper_XVI4GER8)
+TRANS(XVI8GER4    , do_ger_XX3, GER_NOP, gen_helper_XVI8GER4)
+TRANS(XVI8GER4PP  , do_ger_XX3, GER_PP , gen_helper_XVI8GER4)
+TRANS(XVI8GER4SPP , do_ger_XX3, GER_SPP, gen_helper_XVI8GER4)
+TRANS(XVI16GER2   , do_ger_XX3, GER_NOP, gen_helper_XVI16GER2)
+TRANS(XVI16GER2PP , do_ger_XX3, GER_PP , gen_helper_XVI16GER2)
+TRANS(XVI16GER2S  , do_ger_XX3, GER_SAT, gen_helper_XVI16GER2)
+TRANS(XVI16GER2SPP, do_ger_XX3, GER_SPP, gen_helper_XVI16GER2)


Suggested change

TRANS(XVI4GER8 , do_ger_XX3, GER_NOP, gen_helper_XVI4GER8)

TRANS(XVI4GER8PP , do_ger_XX3, GER_PP , gen_helper_XVI4GER8)

TRANS(XVI8GER4 , do_ger_XX3, GER_NOP, gen_helper_XVI8GER4)

TRANS(XVI8GER4PP , do_ger_XX3, GER_PP , gen_helper_XVI8GER4)

TRANS(XVI8GER4SPP , do_ger_XX3, GER_SPP, gen_helper_XVI8GER4)

TRANS(XVI16GER2 , do_ger_XX3, GER_NOP, gen_helper_XVI16GER2)

TRANS(XVI16GER2PP , do_ger_XX3, GER_PP , gen_helper_XVI16GER2)

TRANS(XVI16GER2S , do_ger_XX3, GER_SAT, gen_helper_XVI16GER2)

TRANS(XVI16GER2SPP, do_ger_XX3, GER_SPP, gen_helper_XVI16GER2)

TRANS(XVI4GER8, do_ger_XX3, GER_NOP, gen_helper_XVI4GER8)

TRANS(XVI4GER8PP, do_ger_XX3, GER_PP, gen_helper_XVI4GER8)

TRANS(XVI8GER4, do_ger_XX3, GER_NOP, gen_helper_XVI8GER4)

TRANS(XVI8GER4PP, do_ger_XX3, GER_PP, gen_helper_XVI8GER4)

TRANS(XVI8GER4SPP, do_ger_XX3, GER_SPP, gen_helper_XVI8GER4)

TRANS(XVI16GER2, do_ger_XX3, GER_NOP, gen_helper_XVI16GER2)

TRANS(XVI16GER2PP, do_ger_XX3, GER_PP, gen_helper_XVI16GER2)

TRANS(XVI16GER2S, do_ger_XX3, GER_SAT, gen_helper_XVI16GER2)

TRANS(XVI16GER2SPP, do_ger_XX3, GER_SPP, gen_helper_XVI16GER2)

mferst · 2022-03-25T18:23:51Z

target/ppc/translate/vsx-impl.c.inc

+TRANS64(PMXVI4GER8    , do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI4GER8)
+TRANS64(PMXVI4GER8PP  , do_ger_MMIRR_XX3, GER_PP , gen_helper_XVI4GER8)
+TRANS64(PMXVI8GER4    , do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI8GER4)
+TRANS64(PMXVI8GER4PP  , do_ger_MMIRR_XX3, GER_PP , gen_helper_XVI8GER4)
+TRANS64(PMXVI8GER4SPP , do_ger_MMIRR_XX3, GER_SPP, gen_helper_XVI8GER4)
+TRANS64(PMXVI16GER2   , do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI16GER2)
+TRANS64(PMXVI16GER2PP , do_ger_MMIRR_XX3, GER_PP , gen_helper_XVI16GER2)
+TRANS64(PMXVI16GER2S  , do_ger_MMIRR_XX3, GER_SAT, gen_helper_XVI16GER2)
+TRANS64(PMXVI16GER2SPP, do_ger_MMIRR_XX3, GER_SPP, gen_helper_XVI16GER2)


Suggested change

TRANS64(PMXVI4GER8 , do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI4GER8)

TRANS64(PMXVI4GER8PP , do_ger_MMIRR_XX3, GER_PP , gen_helper_XVI4GER8)

TRANS64(PMXVI8GER4 , do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI8GER4)

TRANS64(PMXVI8GER4PP , do_ger_MMIRR_XX3, GER_PP , gen_helper_XVI8GER4)

TRANS64(PMXVI8GER4SPP , do_ger_MMIRR_XX3, GER_SPP, gen_helper_XVI8GER4)

TRANS64(PMXVI16GER2 , do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI16GER2)

TRANS64(PMXVI16GER2PP , do_ger_MMIRR_XX3, GER_PP , gen_helper_XVI16GER2)

TRANS64(PMXVI16GER2S , do_ger_MMIRR_XX3, GER_SAT, gen_helper_XVI16GER2)

TRANS64(PMXVI16GER2SPP, do_ger_MMIRR_XX3, GER_SPP, gen_helper_XVI16GER2)

TRANS64(PMXVI4GER8, do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI4GER8)

TRANS64(PMXVI4GER8PP, do_ger_MMIRR_XX3, GER_PP, gen_helper_XVI4GER8)

TRANS64(PMXVI8GER4, do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI8GER4)

TRANS64(PMXVI8GER4PP, do_ger_MMIRR_XX3, GER_PP, gen_helper_XVI8GER4)

TRANS64(PMXVI8GER4SPP, do_ger_MMIRR_XX3, GER_SPP, gen_helper_XVI8GER4)

TRANS64(PMXVI16GER2, do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVI16GER2)

TRANS64(PMXVI16GER2PP, do_ger_MMIRR_XX3, GER_PP, gen_helper_XVI16GER2)

TRANS64(PMXVI16GER2S, do_ger_MMIRR_XX3, GER_SAT, gen_helper_XVI16GER2)

TRANS64(PMXVI16GER2SPP, do_ger_MMIRR_XX3, GER_SPP, gen_helper_XVI16GER2)

mferst · 2022-03-25T18:28:14Z

target/ppc/fpu_helper.c

+/*
+ * This function should be changed before sending the patches,
+ * as it's here as a placeholder since the functions that set
+ * FPSCR also raise an exception.
+ */
+static inline void compute_fp_flags(CPUPPCState *env, uintptr_t retaddr)


Does this comment needs an update? This method is also raising the exception at the end.

Maybe It'd be good to rewrite it to make it clearer, but the problem is that other functions would raise an exception before setting all FPSCR bits (and possibly wrongfully unsetting FR/FI), so this function seeks to avoid that

mferst · 2022-03-25T18:40:49Z

target/ppc/insn32.decode

@@ -143,8 +143,9 @@
 &XX3            xt xa xb
 @XX3            ...... ..... ..... ..... ........ ...           &XX3 xt=%xx_xt xa=%xx_xa xb=%xx_xb

-%xx_at         23:3


This fix belong to the second commit ("target/ppc: Implemented xviger instructions")

mferst · 2022-03-25T18:42:41Z

target/ppc/translate/vsx-impl.c.inc

+TRANS(XVF16GER2,   do_ger_XX3, GER_NOP, gen_helper_XVF16GER2)
+TRANS(XVF16GER2PP, do_ger_XX3, GER_PP , gen_helper_XVF16GER2)
+TRANS(XVF16GER2PN, do_ger_XX3, GER_PN , gen_helper_XVF16GER2)
+TRANS(XVF16GER2NP, do_ger_XX3, GER_NP , gen_helper_XVF16GER2)
+TRANS(XVF16GER2NN, do_ger_XX3, GER_NN , gen_helper_XVF16GER2)
+
+TRANS(XVF32GER,    do_ger_XX3, GER_NOP, gen_helper_XVF32GER)
+TRANS(XVF32GERPP,  do_ger_XX3, GER_PP , gen_helper_XVF32GER)
+TRANS(XVF32GERPN,  do_ger_XX3, GER_PN , gen_helper_XVF32GER)
+TRANS(XVF32GERNP,  do_ger_XX3, GER_NP , gen_helper_XVF32GER)
+TRANS(XVF32GERNN,  do_ger_XX3, GER_NN , gen_helper_XVF32GER)
+
+TRANS(XVF64GER,    do_ger_XX3, GER_NOP, gen_helper_XVF64GER)
+TRANS(XVF64GERPP,  do_ger_XX3, GER_PP , gen_helper_XVF64GER)
+TRANS(XVF64GERPN,  do_ger_XX3, GER_PN , gen_helper_XVF64GER)
+TRANS(XVF64GERNP,  do_ger_XX3, GER_NP , gen_helper_XVF64GER)
+TRANS(XVF64GERNN,  do_ger_XX3, GER_NN , gen_helper_XVF64GER)


Suggested change

TRANS(XVF16GER2, do_ger_XX3, GER_NOP, gen_helper_XVF16GER2)

TRANS(XVF16GER2PP, do_ger_XX3, GER_PP , gen_helper_XVF16GER2)

TRANS(XVF16GER2PN, do_ger_XX3, GER_PN , gen_helper_XVF16GER2)

TRANS(XVF16GER2NP, do_ger_XX3, GER_NP , gen_helper_XVF16GER2)

TRANS(XVF16GER2NN, do_ger_XX3, GER_NN , gen_helper_XVF16GER2)

TRANS(XVF32GER, do_ger_XX3, GER_NOP, gen_helper_XVF32GER)

TRANS(XVF32GERPP, do_ger_XX3, GER_PP , gen_helper_XVF32GER)

TRANS(XVF32GERPN, do_ger_XX3, GER_PN , gen_helper_XVF32GER)

TRANS(XVF32GERNP, do_ger_XX3, GER_NP , gen_helper_XVF32GER)

TRANS(XVF32GERNN, do_ger_XX3, GER_NN , gen_helper_XVF32GER)

TRANS(XVF64GER, do_ger_XX3, GER_NOP, gen_helper_XVF64GER)

TRANS(XVF64GERPP, do_ger_XX3, GER_PP , gen_helper_XVF64GER)

TRANS(XVF64GERPN, do_ger_XX3, GER_PN , gen_helper_XVF64GER)

TRANS(XVF64GERNP, do_ger_XX3, GER_NP , gen_helper_XVF64GER)

TRANS(XVF64GERNN, do_ger_XX3, GER_NN , gen_helper_XVF64GER)

TRANS(XVF16GER2, do_ger_XX3, GER_NOP, gen_helper_XVF16GER2)

TRANS(XVF16GER2PP, do_ger_XX3, GER_PP, gen_helper_XVF16GER2)

TRANS(XVF16GER2PN, do_ger_XX3, GER_PN, gen_helper_XVF16GER2)

TRANS(XVF16GER2NP, do_ger_XX3, GER_NP, gen_helper_XVF16GER2)

TRANS(XVF16GER2NN, do_ger_XX3, GER_NN, gen_helper_XVF16GER2)

TRANS(XVF32GER, do_ger_XX3, GER_NOP, gen_helper_XVF32GER)

TRANS(XVF32GERPP, do_ger_XX3, GER_PP, gen_helper_XVF32GER)

TRANS(XVF32GERPN, do_ger_XX3, GER_PN, gen_helper_XVF32GER)

TRANS(XVF32GERNP, do_ger_XX3, GER_NP, gen_helper_XVF32GER)

TRANS(XVF32GERNN, do_ger_XX3, GER_NN, gen_helper_XVF32GER)

TRANS(XVF64GER, do_ger_XX3, GER_NOP, gen_helper_XVF64GER)

TRANS(XVF64GERPP, do_ger_XX3, GER_PP, gen_helper_XVF64GER)

TRANS(XVF64GERPN, do_ger_XX3, GER_PN, gen_helper_XVF64GER)

TRANS(XVF64GERNP, do_ger_XX3, GER_NP, gen_helper_XVF64GER)

TRANS(XVF64GERNN, do_ger_XX3, GER_NN, gen_helper_XVF64GER)

mferst · 2022-03-25T18:45:30Z

target/ppc/translate/vsx-impl.c.inc

+TRANS64(PMXVF16GER2,   do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVF16GER2)
+TRANS64(PMXVF16GER2PP, do_ger_MMIRR_XX3, GER_PP , gen_helper_XVF16GER2)
+TRANS64(PMXVF16GER2PN, do_ger_MMIRR_XX3, GER_PN , gen_helper_XVF16GER2)
+TRANS64(PMXVF16GER2NP, do_ger_MMIRR_XX3, GER_NP , gen_helper_XVF16GER2)
+TRANS64(PMXVF16GER2NN, do_ger_MMIRR_XX3, GER_NN , gen_helper_XVF16GER2)
+
+TRANS64(PMXVF32GER,    do_ger_MMIRR_XX3_NO_PMSK, GER_NOP, gen_helper_XVF32GER)
+TRANS64(PMXVF32GERPP,  do_ger_MMIRR_XX3_NO_PMSK, GER_PP , gen_helper_XVF32GER)
+TRANS64(PMXVF32GERPN,  do_ger_MMIRR_XX3_NO_PMSK, GER_PN , gen_helper_XVF32GER)
+TRANS64(PMXVF32GERNP,  do_ger_MMIRR_XX3_NO_PMSK, GER_NP , gen_helper_XVF32GER)
+TRANS64(PMXVF32GERNN,  do_ger_MMIRR_XX3_NO_PMSK, GER_NN , gen_helper_XVF32GER)
+
+TRANS64(PMXVF64GER,    do_ger_MMIRR_XX3_NO_PMSK, GER_NOP, gen_helper_XVF64GER)
+TRANS64(PMXVF64GERPP,  do_ger_MMIRR_XX3_NO_PMSK, GER_PP , gen_helper_XVF64GER)
+TRANS64(PMXVF64GERPN,  do_ger_MMIRR_XX3_NO_PMSK, GER_PN , gen_helper_XVF64GER)
+TRANS64(PMXVF64GERNP,  do_ger_MMIRR_XX3_NO_PMSK, GER_NP , gen_helper_XVF64GER)
+TRANS64(PMXVF64GERNN,  do_ger_MMIRR_XX3_NO_PMSK, GER_NN , gen_helper_XVF64GER)


Suggested change

TRANS64(PMXVF16GER2, do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVF16GER2)

TRANS64(PMXVF16GER2PP, do_ger_MMIRR_XX3, GER_PP , gen_helper_XVF16GER2)

TRANS64(PMXVF16GER2PN, do_ger_MMIRR_XX3, GER_PN , gen_helper_XVF16GER2)

TRANS64(PMXVF16GER2NP, do_ger_MMIRR_XX3, GER_NP , gen_helper_XVF16GER2)

TRANS64(PMXVF16GER2NN, do_ger_MMIRR_XX3, GER_NN , gen_helper_XVF16GER2)

TRANS64(PMXVF32GER, do_ger_MMIRR_XX3_NO_PMSK, GER_NOP, gen_helper_XVF32GER)

TRANS64(PMXVF32GERPP, do_ger_MMIRR_XX3_NO_PMSK, GER_PP , gen_helper_XVF32GER)

TRANS64(PMXVF32GERPN, do_ger_MMIRR_XX3_NO_PMSK, GER_PN , gen_helper_XVF32GER)

TRANS64(PMXVF32GERNP, do_ger_MMIRR_XX3_NO_PMSK, GER_NP , gen_helper_XVF32GER)

TRANS64(PMXVF32GERNN, do_ger_MMIRR_XX3_NO_PMSK, GER_NN , gen_helper_XVF32GER)

TRANS64(PMXVF64GER, do_ger_MMIRR_XX3_NO_PMSK, GER_NOP, gen_helper_XVF64GER)

TRANS64(PMXVF64GERPP, do_ger_MMIRR_XX3_NO_PMSK, GER_PP , gen_helper_XVF64GER)

TRANS64(PMXVF64GERPN, do_ger_MMIRR_XX3_NO_PMSK, GER_PN , gen_helper_XVF64GER)

TRANS64(PMXVF64GERNP, do_ger_MMIRR_XX3_NO_PMSK, GER_NP , gen_helper_XVF64GER)

TRANS64(PMXVF64GERNN, do_ger_MMIRR_XX3_NO_PMSK, GER_NN , gen_helper_XVF64GER)

TRANS64(PMXVF16GER2, do_ger_MMIRR_XX3, GER_NOP, gen_helper_XVF16GER2)

TRANS64(PMXVF16GER2PP, do_ger_MMIRR_XX3, GER_PP, gen_helper_XVF16GER2)

TRANS64(PMXVF16GER2PN, do_ger_MMIRR_XX3, GER_PN, gen_helper_XVF16GER2)

TRANS64(PMXVF16GER2NP, do_ger_MMIRR_XX3, GER_NP, gen_helper_XVF16GER2)

TRANS64(PMXVF16GER2NN, do_ger_MMIRR_XX3, GER_NN, gen_helper_XVF16GER2)

TRANS64(PMXVF32GER, do_ger_MMIRR_XX3_NO_PMSK, GER_NOP, gen_helper_XVF32GER)

TRANS64(PMXVF32GERPP, do_ger_MMIRR_XX3_NO_PMSK, GER_PP, gen_helper_XVF32GER)

TRANS64(PMXVF32GERPN, do_ger_MMIRR_XX3_NO_PMSK, GER_PN, gen_helper_XVF32GER)

TRANS64(PMXVF32GERNP, do_ger_MMIRR_XX3_NO_PMSK, GER_NP, gen_helper_XVF32GER)

TRANS64(PMXVF32GERNN, do_ger_MMIRR_XX3_NO_PMSK, GER_NN, gen_helper_XVF32GER)

TRANS64(PMXVF64GER, do_ger_MMIRR_XX3_NO_PMSK, GER_NOP, gen_helper_XVF64GER)

TRANS64(PMXVF64GERPP, do_ger_MMIRR_XX3_NO_PMSK, GER_PP, gen_helper_XVF64GER)

TRANS64(PMXVF64GERPN, do_ger_MMIRR_XX3_NO_PMSK, GER_PN, gen_helper_XVF64GER)

TRANS64(PMXVF64GERNP, do_ger_MMIRR_XX3_NO_PMSK, GER_NP, gen_helper_XVF64GER)

TRANS64(PMXVF64GERNN, do_ger_MMIRR_XX3_NO_PMSK, GER_NN, gen_helper_XVF64GER)

vcoracolombo · 2022-04-18T18:11:23Z

target/ppc/insn32.decode

@@ -155,6 +155,9 @@
 &XX2            xt xb
 @XX2            ...... ..... ..... ..... ......... ..           &XX2 xt=%xx_xt xb=%xx_xb

+&X_a            ra
+@X_a            ...... ra:3 .. ..... ..... .......... .          &X_a


remove extra space before &X_a

vcoracolombo · 2022-04-18T18:14:23Z

target/ppc/insn32.decode

+
+## Vector GER instruction
+
+XVI4GER8        111011 ... -- ..... ..... 00100011 ..-  @XX3_at xa=%xx_xa


Interesting, I didn't know this xa=%xx_xa worked here

vcoracolombo · 2022-04-18T18:36:46Z

target/ppc/translate/vsx-impl.c.inc

+
+    mask = ger_pack_masks(a->pmsk, a->ymsk, a->xmsk);
+    helper(cpu_env, tcg_constant_i32(a->xa), tcg_constant_i32(a->xb),
+           tcg_constant_i32(a->xt * 4), tcg_constant_i32(mask),


I don't have a strong opinion on this, but would this be the case to use times_4 from translate.c? Does it even work with prefixed instructions?

Yes, the new version has times_4 for comparison

According to Power ISA, the FI bit in FPSCR is non-sticky. This means that if an instruction is said to modify the FI bit, then it should be set or cleared depending on the result of the instruction. Otherwise, it should be kept as was before. However, the following inconsistency was found when comparing results from the hardware (tested on both a Power 9 processor and in Power 10 Mambo): (FI bit is set before the execution of the instruction) Hardware: xscmpeqdp(0xff..ff, 0xff..ff) = FI: SET -> SET QEMU: xscmpeqdp(0xff..ff, 0xff..ff) = FI: SET -> CLEARED As the FI bit is non-sticky, and xscmpeqdp does not list it as a field that is changed by the instruction, it should not be changed after its execution. This is happening to multiple instructions in the vsx implementations. If the ISA does not list the FI bit as altered for a particular instruction, then it should be kept as it was before the instruction. QEMU is not following this behavior. Affected instructions include: - xv* (all vsx-vector instructions); - xscmp*, xsmax*, xsmin*; - xstdivdp and similars; (to identify the affected instructions, just search in the ISA for the instructions that does not list FI in "Special Registers Altered") Most instructions use the function do_float_check_status() to commit changes in the inexact flag. So the fix is to add a parameter to it that will control if the bit FI should be changed or not. All users of do_float_check_status() are then modified to provide this argument, controlling if that specific instruction changes bit FI or not. Some macro helpers are responsible for both instructions that change and instructions that aren't suposed to change FI. This seems to always overlap with the sfprf flag. So, reuse this flag for this purpose when applicable. Signed-off-by: Víctor Colombo <[email protected]> Reviewed-by: Richard Henderson <[email protected]>

This patch fixes another not-so-clear situation in Power ISA regarding the inexact bits in FPSCR. The ISA states that: """ When Overflow Exception is disabled (OE=0) and an Overflow Exception occurs, the following actions are taken: ... 2. Inexact Exception is set XX <- 1 ... FI is set to 1 ... """ However, when tested on a Power 9 hardware, some instructions that trigger an OX don't set the FI bit: xvcvdpsp(0x4050533fcdb7b95ff8d561c40bf90996) = FI: CLEARED -> CLEARED xvnmsubmsp(0xf3c0c1fc8f3230, 0xbeaab9c5) = FI: CLEARED -> CLEARED (just a few examples. Other instructions are also affected) The root cause for this seems to be that only instructions that list the bit FI in the "Special Registers Altered" should modify it. QEMU is, today, not working like the hardware: xvcvdpsp(0x4050533fcdb7b95ff8d561c40bf90996) = FI: CLEARED -> SET xvnmsubmsp(0xf3c0c1fc8f3230, 0xbeaab9c5) = FI: CLEARED -> SET (all tests assume FI is cleared beforehand) Fix this by making float_overflow_excp() return float_flag_inexact if it should update the inexact flags. Signed-off-by: Víctor Colombo <[email protected]> Reviewed-by: Richard Henderson <[email protected]> Reviewed-by: Rashmica Gupta <[email protected]>

The bit FI fix used the sfprf flag as a flag for the set_fi parameter in do_float_check_status where applicable. Now, this patch rename this flag to sfifprf to state this dual usage. Signed-off-by: Víctor Colombo <[email protected]> Reviewed-by: Richard Henderson <[email protected]> Reviewed-by: Rashmica Gupta <[email protected]>

Implement the following PowerISA v3.1 instructions: xxmfacc: VSX Move From Accumulator xxmtacc: VSX Move To Accumulator xxsetaccz: VSX Set Accumulator to Zero The PowerISA 3.1 mentions that for the current version of the architecture, "the hardware implementation provides the effect of ACC[i] and VSRs 4*i to 4*i + 3 logically containing the same data" and "The Accumulators introduce no new logical state at this time" (page 501). For now it seems unnecessary to create new structures, so this patch just uses ACC[i] as VSRs 4*i to 4*i+3 and therefore move to and from accumulators are no-ops. Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]> Reviewed-by: Richard Henderson <[email protected]>

Implement the following PowerISA v3.1 instructions: xvi4ger8: VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update) xvi4ger8pp: VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update) Positive multiply, Positive accumulate xvi8ger4: VSX Vector 4-bit Signed Integer GER (rank-8 update) xvi8ger4pp: VSX Vector 4-bit Signed Integer GER (rank-8 update) Positive multiply, Positive accumulate xvi8ger4spp: VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update) with Saturate Positive multiply, Positive accumulate xvi16ger2: VSX Vector 16-bit Signed Integer GER (rank-2 update) xvi16ger2pp: VSX Vector 16-bit Signed Integer GER (rank-2 update) Positive multiply, Positive accumulate xvi16ger2s: VSX Vector 16-bit Signed Integer GER (rank-2 update) with Saturation xvi16ger2spp: VSX Vector 16-bit Signed Integer GER (rank-2 update) with Saturation Positive multiply, Positive accumulate Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]> Reviewed-by: Richard Henderson <[email protected]>

Implement the following PowerISA v3.1 instructions: pmxvi4ger8: Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update) pmxvi4ger8pp: Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update) Positive multiply, Positive accumulate pmxvi8ger4: Prefixed Masked VSX Vector 4-bit Signed Integer GER (rank-8 update) pmxvi8ger4pp: Prefixed Masked VSX Vector 4-bit Signed Integer GER (rank-8 update) Positive multiply, Positive accumulate pmxvi8ger4spp: Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update) with Saturate Positive multiply, Positive accumulate pmxvi16ger2: Prefixed Masked VSX Vector 16-bit Signed Integer GER (rank-2 update) pmxvi16ger2pp: Prefixed Masked VSX Vector 16-bit Signed Integer GER (rank-2 update) Positive multiply, Positive accumulate pmxvi16ger2s: Prefixed Masked VSX Vector 16-bit Signed Integer GER (rank-2 update) with Saturation pmxvi16ger2spp: Prefixed Masked VSX Vector 16-bit Signed Integer GER (rank-2 update) with Saturation Positive multiply, Positive accumulate Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]> Reviewed-by: Richard Henderson <[email protected]>

Implement the following PowerISA v3.1 instructions: xvf32ger: VSX Vector 32-bit Floating-Point GER (rank-1 update) xvf32gernn: VSX Vector 32-bit Floating-Point GER (rank-1 update) Negative multiply, Negative accumulate xvf32gernp: VSX Vector 32-bit Floating-Point GER (rank-1 update) Negative multiply, Positive accumulate xvf32gerpn: VSX Vector 32-bit Floating-Point GER (rank-1 update) Positive multiply, Negative accumulate xvf32gerpp: VSX Vector 32-bit Floating-Point GER (rank-1 update) Positive multiply, Positive accumulate xvf64ger: VSX Vector 64-bit Floating-Point GER (rank-1 update) xvf64gernn: VSX Vector 64-bit Floating-Point GER (rank-1 update) Negative multiply, Negative accumulate xvf64gernp: VSX Vector 64-bit Floating-Point GER (rank-1 update) Negative multiply, Positive accumulate xvf64gerpn: VSX Vector 64-bit Floating-Point GER (rank-1 update) Positive multiply, Negative accumulate xvf64gerpp: VSX Vector 64-bit Floating-Point GER (rank-1 update) Positive multiply, Positive accumulate Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]> Reviewed-by: Richard Henderson <[email protected]>

Implement the following PowerISA v3.1 instructions: xvf16ger2: VSX Vector 16-bit Floating-Point GER (rank-2 update) xvf16ger2nn: VSX Vector 16-bit Floating-Point GER (rank-2 update) Negative multiply, Negative accumulate xvf16ger2np: VSX Vector 16-bit Floating-Point GER (rank-2 update) Negative multiply, Positive accumulate xvf16ger2pn: VSX Vector 16-bit Floating-Point GER (rank-2 update) Positive multiply, Negative accumulate xvf16ger2pp: VSX Vector 16-bit Floating-Point GER (rank-2 update) Positive multiply, Positive accumulate Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]>

Implement the following PowerISA v3.1 instructions: pmxvf16ger2: Prefixed Masked VSX Vector 16-bit Floating-Point GER (rank-2 update) pmxvf16ger2nn: Prefixed Masked VSX Vector 16-bit Floating-Point GER (rank-2 update) Negative multiply, Negative accumulate pmxvf16ger2np: Prefixed Masked VSX Vector 16-bit Floating-Point GER (rank-2 update) Negative multiply, Positive accumulate pmxvf16ger2pn: Prefixed Masked VSX Vector 16-bit Floating-Point GER (rank-2 update) Positive multiply, Negative accumulate pmxvf16ger2pp: Prefixed Masked VSX Vector 16-bit Floating-Point GER (rank-2 update) Positive multiply, Positive accumulate pmxvf32ger: Prefixed Masked VSX Vector 32-bit Floating-Point GER (rank-1 update) pmxvf32gernn: Prefixed Masked VSX Vector 32-bit Floating-Point GER (rank-1 update) Negative multiply, Negative accumulate pmxvf32gernp: Prefixed Masked VSX Vector 32-bit Floating-Point GER (rank-1 update) Negative multiply, Positive accumulate pmxvf32gerpn: Prefixed Masked VSX Vector 32-bit Floating-Point GER (rank-1 update) Positive multiply, Negative accumulate pmxvf32gerpp: Prefixed Masked VSX Vector 32-bit Floating-Point GER (rank-1 update) Positive multiply, Positive accumulate pmxvf64ger: Prefixed Masked VSX Vector 64-bit Floating-Point GER (rank-1 update) pmxvf64gernn: Prefixed Masked VSX Vector 64-bit Floating-Point GER (rank-1 update) Negative multiply, Negative accumulate pmxvf64gernp: Prefixed Masked VSX Vector 64-bit Floating-Point GER (rank-1 update) Negative multiply, Positive accumulate pmxvf64gerpn: Prefixed Masked VSX Vector 64-bit Floating-Point GER (rank-1 update) Positive multiply, Negative accumulate pmxvf64gerpp: Prefixed Masked VSX Vector 64-bit Floating-Point GER (rank-1 update) Positive multiply, Positive accumulate Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]> Reviewed-by: Richard Henderson <[email protected]>

Implement the following PowerISA v3.1 instructions: xvbf16ger2: VSX Vector bfloat16 GER (rank-2 update) xvbf16ger2nn: VSX Vector bfloat16 GER (rank-2 update) Negative multiply, Negative accumulate xvbf16ger2np: VSX Vector bfloat16 GER (rank-2 update) Negative multiply, Positive accumulate xvbf16ger2pn: VSX Vector bfloat16 GER (rank-2 update) Positive multiply, Negative accumulate xvbf16ger2pp: VSX Vector bfloat16 GER (rank-2 update) Positive multiply, Positive accumulate pmxvbf16ger2: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update) pmxvbf16ger2nn: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update) Negative multiply, Negative accumulate pmxvbf16ger2np: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update) Negative multiply, Positive accumulate pmxvbf16ger2pn: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update) Positive multiply, Negative accumulate pmxvbf16ger2pp: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update) Positive multiply, Positive accumulate Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]> Reviewed-by: Richard Henderson <[email protected]>

These are new hwcap bits added for power10. Signed-off-by: Joel Stanley <[email protected]> Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]> Reviewed-by: Richard Henderson <[email protected]>

alqotel self-assigned this Mar 16, 2022

vcoracolombo reviewed Mar 17, 2022

View reviewed changes

alqotel force-pushed the alqotel_acc_final branch 3 times, most recently from 9eb65b2 to b7c0382 Compare March 25, 2022 12:24

alqotel requested review from lrcoutinho, mferst and vcoracolombo March 25, 2022 12:25

mferst reviewed Mar 25, 2022

View reviewed changes

alqotel force-pushed the alqotel_acc_final branch 4 times, most recently from 44bd674 to 0b52213 Compare April 4, 2022 15:09

vcoracolombo reviewed Apr 18, 2022

View reviewed changes

alqotel force-pushed the alqotel_acc_final branch 4 times, most recently from 56b8efe to 87df5e7 Compare May 11, 2022 17:59

alqotel force-pushed the alqotel_acc_final branch from 87df5e7 to 2986290 Compare May 17, 2022 11:57

vcoracolombo and others added 11 commits May 17, 2022 13:46

target/ppc: vsxger16 accumulation rth requested changes

a0e28a4

target/ppc: vsxger16: adapt rth's solution give same result as mambo

ab883f1

Lucas Mateus Castro (alqotel) and others added 3 commits May 18, 2022 09:09

linux-user: Add PowerPC ISA 3.1 and MMA to hwcap

67e90b3

These are new hwcap bits added for power10. Signed-off-by: Joel Stanley <[email protected]> Signed-off-by: Lucas Mateus Castro (alqotel) <[email protected]> Reviewed-by: Richard Henderson <[email protected]>

target/ppc: Change to work on risu

6f23817

alqotel force-pushed the alqotel_vsx_next branch from 36f0e06 to cb7833f Compare May 18, 2022 18:12

alqotel force-pushed the alqotel_acc_final branch from 2986290 to 6f23817 Compare May 18, 2022 18:13

alqotel closed this Oct 25, 2022

		excp_ptr = &env->fp_status;
		excp_flags = get_float_exception_flags(excp_ptr);

		set_cpu_vsr(a->ra * 4 + i, tcg_constant_i64(0), false);
		set_cpu_vsr(a->ra * 4 + i, tcg_constant_i64(0), true);


		## Vector GER instruction

		XVI4GER8 111011 ... -- ..... ..... 00100011 ..- @XX3_at xa=%xx_xa

Accumulator instructions #77

Accumulator instructions #77

Uh oh!

Conversation

alqotel commented Mar 16, 2022

Uh oh!

vcoracolombo Mar 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vcoracolombo Mar 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vcoracolombo Mar 16, 2022 •

edited

Loading

vcoracolombo Mar 17, 2022 •

edited

Loading