Skip to content

WIP: Smtlib Prelude Injection #340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

WIP: Smtlib Prelude Injection #340

wants to merge 2 commits into from

Conversation

philzook58
Copy link
Contributor

@philzook58 philzook58 commented Aug 23, 2021

Initial pass at adding a smtlib prelude. Ready for discussion at least I think.
Currently on the test example, it prepends the following string to both the precond and postcond

(define-sort pointer () (_ BitVec 64))
(define-sort memsort () (Array (_ BitVec 64) (_ BitVec 8)))
(define-sort BV8 () (_ BitVec 8))
(define-sort BV16 () (_ BitVec 16))
(define-sort BV32 () (_ BitVec 32))
(define-sort BV64 () (_ BitVec 64))
(define-sort char () (_ BitVec 8))
(define-sort byte () (_ BitVec 8))
(define-sort int16 () (_ BitVec 16))
(define-sort int32 () (_ BitVec 32))
(define-sort int64 () (_ BitVec 64))
(define-fun init-mem-equal () Bool (= init_mem_orig init_mem_mod))
(define-fun mem-equal () Bool (= mem_orig mem_mod))
(define-fun init_main_argc_orig () (_ BitVec 64) init_RDI0)
(define-fun init_main_argv_orig () (_ BitVec 64) init_RSI0)
(define-fun main_argv_orig () (_ BitVec 64) RSI0)
(define-fun main_result_orig () (_ BitVec 64) RAX0)
(define-fun init_main_argc_mod () (_ BitVec 64) init_RDI047)
(define-fun init_main_argv_mod () (_ BitVec 64) init_RSI049)
(define-fun main_argv_mod () (_ BitVec 64) RSI049)
(define-fun main_result_mod () (_ BitVec 64) RAX043)

Note that bap prepends the name of the subroutine to the variable names. Perhaps we want to trim this off. We also want to inject the prelude for single binary mode.

Features that could still be added to the prelude:

  • store and select synonyms. These are endianness dependent
(define-fun select8 ((mem memsort) (p pointer)) (select mem p))
(define-fun select16 ((mem memsort) (p pointer))  (concat 
  (select mem p)
  (select mem (bvadd p (_ bv1 64)))
  ))
(define-fun select32 ((mem memsort) (p pointer)) 
(concat 
  (select16 mem p)
  (select16 mem (bvadd p (_ bv2 64)))
  ))
(define-fun select64 ((mem memsort) (p pointer)) 
(concat 
  (select32 mem p)
  (select32 mem (bvadd p (_ bv4 64)))
  ))


(define-fun store8 ((mem memsort) (p pointer) (v (_ BitVec 8))) (store mem p v))
(define-fun store16 ((mem memsort) (p pointer) (v (_ BitVec 8)))
    (store8 (store mem p (extract 7 0 v)) (bvadd p (_ bv1 64)) (extract 15 8 v))
)
  • struct accessor or offset synonyms. This information is accessible from the Args.t also in the presence of header files.
  • pto-stack and pto-heap predicates. We should reuse the functions already in env.ml. It may end up looking something like:
(define-fun pto-stack ((addr pointer)) Bool  
      (and (bvule addr #x0000000040000000) (bvuge addr #x000000003F800000)))

(define-fun pto-heap ((addr pointer)) Bool 
    (bvule addr (bvsub #x000000003F800000 #x0000000000000100)))
  • Maybe generic arg0 arg1 arg2, etc for the typical calling convention of the abi detected.
  • Other useful pre-canned predicates
(define-fun args-equal () Bool (and (= arg0_orig arg0_mod) (= arg1_orig arg1_mod))); and so on
(define-fun same-env ()  Bool  (and mem-equal args-equal))
(define-fun retval-equal () Bool (= retval_orig retval_mod))
  • We could perhaps replace our current strategy of replacing the _init _orig _mod variables with their actual names using this system as well. I chose not to do this at the moment since it does not enhance anything, although I think it would be a bit more elegant

@philzook58 philzook58 changed the title Initial commit of smtlib prelude injection Smtlib Prelude Injection Aug 23, 2021
@philzook58 philzook58 changed the title Smtlib Prelude Injection WIP: Smtlib Prelude Injection Sep 1, 2021
Copy link

@fortunac fortunac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super neat! I have a few questions:

  • Is the prelude added by default if the user specifies a precondition or postcondition via smtlib string? For the comparative case, are we duplicating the constraints that equate registers to each other between the original and modified binaries?

  • I wonder if we should be more consistent with the uses of dashes vs underscores in the prelude names. For example, we have some functions that use dashes and some that use underscores.

(define-fun mem-equal () Bool (= mem_orig mem_mod))
(define-fun init_main_argc_orig () (_ BitVec 64) init_RDI0)
  • In the potential predicates that you mentioned, what does the pto in pto-stack and pto-heap mean?

Comment on lines +104 to +107
let define_fun name args retsort body =
let args = List.map ~f:(fun (v,sort) -> Sexp.List [v; sort]) args in
Sexp.List [Sexp.Atom "define-fun"; Sexp.Atom name; Sexp.List args; retsort; body]
let synonym name ret body = define_fun name [] ret body
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to annotate the types of the arguments to these functions?

let sexp_of_sort (s : Z3.Sort.sort) : Sexp.t = Sexp.of_string (Z3.Sort.to_string s)

(* [make_arg_synonyms] constructs smtlib formula for the arguments of a subroutine.
This make synonyms that for high level c-like names ot low level locations *)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ot supposed to be at?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit I still don't really get this sentence.

Maybe an example here?

(define-sort BV64 () (_ BitVec 64))
(define-sort char () (_ BitVec 8))
(define-sort byte () (_ BitVec 8))
(define-sort int16 () (_ BitVec 16))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between a BV16 and an int16 in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing. I thought it was reasonable or at least unharmful to have C-esque looking types and something that is just a nice shorthand evocative of (_ BitVec 64). Maybe I should use int16_t instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely int16_t!

define_sort "memsort" (sexp_of_sort (Z3.Expr.get_sort z3_mem))

let compare_prelude_smtlib2 main_sub1 main_sub2 (env1 : Env.t) (env2 : Env.t) : string =
let std_prelude = "
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have some sort of file that contains the prelude that we read in over here rather than having a hard-coded string.

Copy link
Contributor Author

@philzook58 philzook58 Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the impulse, but I'd rather not. A large amount of the string is programmatically generated and going to an external file adds a lot of complexity in terms of finding the file in question from wherever the wp is invoked. I think going to an external file was a mistake we made in vibes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stand by the vibes decision though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I feel like the string is short enough to allow minor sins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added an unnecessary and weird installation step that can easily go wrong or get stale just to avoid recompiling when we as developers edit the string. No user is ever going to edit this string and they can achieve the same functionality by writing define-fun in their own smtlib. There is the possibility of adding another command line feature of --wp-user-lib=mylib.smt2 to avoid duplication between the pre and post condition. I'm not entirely sure we have to have both pre and post condition flags anyway, since this can be achieved via =>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so?

Copy link
Contributor Author

@philzook58 philzook58 Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(=> (preconds referring to init_ variables ) (post-conds referring to both init and non-init)). I feel like this is how I've seen Chris write specs in vibes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I guess we have some redundancy between init_var and the pre flag.

@philzook58
Copy link
Contributor Author

philzook58 commented Sep 2, 2021

  • The prelude is added if the user uses custom pre/post conds. I am not currently replicating the assumption that registers are equal, in fact I add no assuptions. The user needs to (assert library-defined-thing) in order for anything to change at all. I kind of think the default behavior of assuming registers equal should be turned off personally and instead create a library predicate called init-registers-equal if you want this behavior.
  • Hyphens vs underscores. Lisp convention is hyphens, our convention for init_ mod_ is underscores. That is the difference. I tried to stick to hyphens for anything that wasn't init_ _mod _orig. main_argv is a bap generated name for which I kind of think we might like to strip the main_.
  • pto is short for "points to" I'm channeling a bit this: https://cvc4.github.io/separation-logic.html I thought that the original name of on-stack was a bit confusing since a local variable is "on the stack" but really the predicate is about pointers that point to the stack. I could expand the name to points-to-stack. Perhaps the clarity is worth the verbosity

(** [args_prelude_compare] builds smtlib synonyms in a comparison for the arguments of
two compared subroutines. *)
let args_prelude_compare
(main_sub1 : Sub.t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need to change on rebase: there's a new type 'a code which represents a pair of an 'a term and an env.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it's in compare.mli)

@fortunac
Copy link

fortunac commented Sep 3, 2021

* The prelude is added if the user uses custom pre/post conds. I am not currently replicating the assumption that registers are equal, in fact I add no assuptions. The user needs to `(assert library-defined-thing)` in order for anything to change at all. I kind of think the default behavior of assuming registers equal should be turned off personally and instead create a library predicate called `init-registers-equal` if you want this behavior.

* Hyphens vs underscores. Lisp convention is hyphens, our convention for init_ mod_ is underscores. That is the difference. I tried to stick to hyphens for anything that wasn't init_ _mod _orig. main_argv is a bap generated name for which I kind of think we might like to strip the `main_`.

* pto is short for "points to" I'm channeling a bit this: https://cvc4.github.io/separation-logic.html  I thought that the original name of `on-stack` was a bit confusing since a local variable is "on the stack" but really the predicate is about pointers that point to the stack. I could expand the name to `points-to-stack`. Perhaps the clarity is worth the verbosity

Cool! This answers all of my questions. I think we should document when the prelude gets added and a list of the pre-defined predicates somewhere.

@codyroux
Copy link
Contributor

I'm good with this, as soon as it's rebased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants