Skip to content

Commit b005e77

Browse files
committed
Add some DP text
1 parent 8e136da commit b005e77

File tree

3 files changed

+364
-3
lines changed

3 files changed

+364
-3
lines changed

Makefile

+11-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
1-
.PHONY: all venv clean
1+
.PHONY: all venv clean images
22
.SUFFIXES: .bs .html
33

4+
IMAGES := $(wildcard images/*.svg)
5+
46
all: build/index.html
57

68
clean:
@@ -21,5 +23,12 @@ $(bikeshed): $(venv-marker) Makefile
2123
build:
2224
mkdir -p $@
2325

24-
build/index.html: api.bs build $(bikeshed)
26+
build/index.html: api.bs $(IMAGES) build $(bikeshed)
2527
$(bikeshed) --die-on=warning spec $< $@
28+
29+
images:
30+
@echo "Regenerating images"
31+
for i in $(IMAGES); do \
32+
tmp="$$(mktemp)"; \
33+
npx aasvg --extract --embed <"$$i" >"$$tmp" && mv "$$tmp" "$$i"; \
34+
done

api.bs

+280-1
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,10 @@ TODO explain why we use histograms
100100

101101
# Overview of Operation # {#overview}
102102

103+
<!-- TODO: remove -->
104+
<dfn>impression</dfn>
105+
<dfn>conversion</dfn>
106+
103107
At impression time, information about an advertisement is saved by the browser in a write-only store.
104108
This includes an identifier for the ad and some metadata about the ad,
105109
such as whether the impression was an ad view or an ad click.
@@ -142,6 +146,10 @@ The aggregation service:
142146

143147
# API Details # {#api}
144148

149+
<!-- TODO fixup -->
150+
A <dfn>conversion report</dfn> is generated when querying impressions.
151+
Conversion reports are encrypted toward the identified aggregation service.
152+
145153
TODO
146154

147155

@@ -167,20 +175,278 @@ TODO
167175

168176
TODO
169177

178+
## Anti-Replay Requirements ## {#anti-replay}
179+
180+
<!-- TODO link to definition of "conversion report" -->
181+
Conversion reports generated by browsers are bound
182+
to the amount of [=privacy budget=]
183+
that was expended by the site that requested the report.
184+
185+
TODO
186+
170187

171188
# Differential Privacy # {#dp}
172189

173-
This design uses the concept of differential privacy as the basis of its privacy design.
190+
This design uses the concept of [=differential privacy=]
191+
as the basis of its privacy design. [[PPA-DP]]
192+
193+
<dfn>Differential privacy</dfn> is a mathematical definition of privacy
194+
that can guarantee the amount of private information
195+
that is revealed by a system. [[DP]]
196+
Differential privacy is not the only means
197+
by which privacy is protected in this system,
198+
but it is the most rigorously defined and analyzed.
199+
As such, it provides the strongest privacy guarantees.
200+
201+
Differential privacy uses randomized noise
202+
to hide private data contributions
203+
to an aggregated dataset.
204+
The effect of noise is to hide
205+
individual contributions to the dataset,
206+
but to retain the usefulness of any aggregated analysis.
207+
208+
To apply differential privacy,
209+
it is necessary to define what information is protected.
210+
In this system, the protected information is
211+
the [=impressions=] of a single user profile,
212+
on a single user agent,
213+
over a single week,
214+
for a single website that registers [=conversions=].
215+
[[#dp-unit]] describes the implications of this design
216+
in more detail.
217+
218+
This attribution design uses a form of differential privacy
219+
called <dfn>individual differential privacy</dfn>.
220+
In this model, user agents are each separately responsible
221+
for ensuring that they limit the information
222+
that is contributed.
223+
224+
The [=individual differential privacy=] design of this API
225+
has three primary components:
226+
227+
1. User agents limit the number of times
228+
that they use [=impressions=] in [=conversion reports=].
229+
[[#dp-budget]] explores this in greater depth.
230+
231+
2. [=Aggregation services=] ensure that any given [=conversion report=] is
232+
only used in accordance with the [=privacy budget=].
233+
[[#anti-replay]] describes requirements on aggregation services
234+
in more detail.
235+
236+
3. Noise is added by [=aggregation services=].
237+
[[#dp-mechanism]] details the mechanisms that might be used.
238+
239+
Together, these measures place limits
240+
on the information that is released for each [=privacy unit=].
241+
242+
243+
## Privacy Unit ## {#dp-unit}
244+
245+
An implementation of differential privacy
246+
requires a clear definition for what is protected.
247+
This is known as the <dfn>privacy unit</dfn>,
248+
which represents the entity that receives privacy protection.
249+
250+
This system adopts a [=privacy unit=]
251+
that is the combination of three values:
252+
253+
1. A user agent profile.
254+
That is, an instance of a user agent,
255+
as used by a single person.
256+
257+
2. The [=site=] that requests information about impressions.
258+
259+
<p class=note>The sites that register impressions
260+
are not considered.
261+
Those sites do not receive information from this system directly.
262+
263+
3. The current week.
264+
265+
A change to any of these values produces a new privacy unit,
266+
which results in a separate [=privacy budget=].
267+
Each site that a person visits receives a bounded amount of information
268+
for each week.
269+
270+
Ideally, the [=privacy unit=] is a single person.
271+
Though ideal, it is not possible to develop a useful system
272+
that guarantees perfect correspondance with a person,
273+
for a number of reasons:
274+
275+
* People use multiple browsers and multiple devices,
276+
often without coordination.
277+
278+
* A unit that covered all websites
279+
could be exhausted by one site,
280+
denying other sites any information.
281+
282+
* Advertising is an ongoing activity.
283+
Without renewing the [=privacy budget=] periodically,
284+
sites could exhaust their budget forever.
285+
286+
287+
### Browser Instances ### {#dp-instance}
288+
289+
Each browser instance manages a separate [=privacy budget=].
290+
291+
Coordination between browser instances might be possible,
292+
but not expected.
293+
That coordination might allow privacy to be improved
294+
by reducing the total amount of information that is released.
295+
It might also improve the utility of attribution
296+
by allowing impressions on one browser instance
297+
to be converted on another.
298+
299+
Coordination across different implementations
300+
is presently out of scope for this work.
301+
Implementations can perform some coordination
302+
between instances that are known to be for the same person,
303+
but this is not mandatory.
304+
305+
306+
### Per-Site Limits ### {#dp-site}
307+
308+
The information released to websites is done on the basis of [=site=].
309+
This aligns with the same boundary used in other privacy-relevant functions.
310+
311+
A finer privacy unit, such as an [=origin=],
312+
would make it trivial to obtain additional information.
313+
Information about the same person could be gathered
314+
from multiple origins.
315+
That information could then be combined
316+
by exploiting the free flow of information within the site,
317+
using cookies [[COOKIES]] or similar.
318+
319+
[[#dp-safety]] discusses attacks that exploit this limit
320+
and some additional [=safety limits=] that might be implemented
321+
by user agents
322+
to protect against those attacks.
323+
324+
325+
### Refresh Interval ### {#dp-refresh}
326+
327+
The differential privacy budget available to a site
328+
is refreshed at an interval of one week.
329+
330+
This budget applies to the [=impressions=]
331+
that are registered with the user agent
332+
and later queried,
333+
not conversions.
334+
335+
From the perspective of the analysis [[PPA-DP]]
336+
each week of impressions forms a separate database.
337+
A finite number of queries can be made of each database,
338+
as determined by the [=privacy budget=]
339+
associated with that database.
340+
341+
The goal is to set a value that is as large as feasible.
342+
A longer period of time allows for a better privacy/utility balance
343+
because sites can be allocated a larger overall budget
344+
at any point in time,
345+
while keeping the overall rate of privacy loss low.
346+
However, a longer interval means that it is easier to
347+
exhaust a privacy budget completely,
348+
yield no information until the next refresh.
349+
350+
The choice of a week is largely arbitrary.
351+
One week is expected to be enough to allow sites
352+
the ability to make decisions about how to spend [=privacy budgets=]
353+
without careful planning that needs to account for
354+
changes that might occur days or weeks in the future.
355+
356+
[[#dp-budget]] describes the process for budgeting in more detail.
174357

175358

176359
## Privacy Budgets ## {#dp-budget}
177360

178361
Browsers maintain a <dfn>privacy budget</dfn>,
179362
which is a means of limiting the amount of privacy loss.
180363

364+
This specification uses an individual form
365+
of (&epsilon;, &delta;)-differential privacy as its basis.
366+
In this model, privacy loss is measured using the value &epsilon;.
367+
The &delta; value is handled by the [=aggregation service=]
368+
when adding noise to aggregates.
369+
370+
Each user agent instance is responsible for
371+
managing privacy budgets.
372+
373+
Each [=conversion report=] that is requested specifies an &epsilon; value
374+
that represents the amount of privacy budget
375+
that the report consumes.
376+
377+
When searching for impressions for the conversion report,
378+
the user agent deducts the specified &epsilon; value from
379+
the budget for the week in which those impressions fall.
380+
If the privacy budget for that week is not sufficient,
381+
the impressions from that week are not used.
382+
383+
<div class=example id=ex-budget>
384+
In the following figure,
385+
impressions are recorded from a number of different sites,
386+
shown with circles.
387+
388+
<figure>
389+
<pre class=include-raw>
390+
path:images/budget.svg
391+
</pre>
392+
<figcaption>An example of a store of impressions over time</figcaption>
393+
</figure>
394+
395+
A [=conversion report=] might be requested at the time marked with "now".
396+
That conversion report selects impressions marked with black circles,
397+
corresponding to impressions from Site B, C, and E.
398+
399+
As a result, privacy budgets for the querying site is deducted
400+
from weeks 1, 3, 4, and 5.
401+
No impressions were recorded for week 2,
402+
so no budget is deducted from that week.
403+
</div>
404+
405+
181406
TODO
182407

183408

409+
### Safety Limits ### {#dp-safety}
410+
411+
The basic [=privacy unit=] is vulnerable to attack
412+
by an adversary that is able to correlate activity for the same person
413+
across multiple [=sites=].
414+
415+
Groups of sites can sometimes coordinate their activity,
416+
such as when they have shared ownership or strong agreements.
417+
A group of sites that can be sure that particular visitor is the same person--
418+
using any means, including something like FedCM [[FEDCM]]--
419+
can combine information gained from this API.
420+
421+
This can be used to increase the rate
422+
at which a site gains information from attribution,
423+
proportional to the number of sites
424+
across which coordination occurs.
425+
The default privacy unit places no limit on the information released
426+
in this way.
427+
428+
To counteract this effect, user agents can implement <dfn>safety limits</dfn>,
429+
which are additional privacy budgets that do not consider site.
430+
Safety limits might be significantly higher than per-site budgets,
431+
so that they are not reached for most normal browsing activity.
432+
The goal would be to ensure that they are only effective
433+
for intensive activity or when being attacked.
434+
435+
Like the per-site privacy budget,
436+
it is critical that sites be unable to determine
437+
whether their request for a [=conversion report=] has caused
438+
a safety limit to be exceeded.
439+
440+
441+
442+
443+
## Differential Privacy Mechanisms ## {#dp-mechanism}
444+
445+
The specific mechanisms that are used
446+
depend on the type of [=aggregation service=].
447+
448+
449+
184450
# Security # {#security}
185451

186452
TODO
@@ -193,8 +459,21 @@ The broad shape of this level of the API is based on an idea from Luke Winstrom.
193459
The privacy architecture is courtesy of the authors of [[PPA-DP]].
194460

195461

462+
<pre class=link-defaults>
463+
spec:html; type:dfn; text:site
464+
</pre>
196465
<pre class=biblio>
197466
{
467+
"dp": {
468+
"authors": [
469+
"Cynthia Dwork",
470+
"Aaron Roth"
471+
],
472+
"date": "2014",
473+
"href": "https://doi.org/10.1561/0400000042",
474+
"title": "The Algorithmic Foundations of Differential Privacy",
475+
"publisher": "now, Foundations and Trends in Theoretical Computer Science, Vol. 9, Nos. 3–4"
476+
},
198477
"ppa-dp": {
199478
"authors": [
200479
"Pierre Tholoniat",

0 commit comments

Comments
 (0)