@@ -100,6 +100,10 @@ TODO explain why we use histograms
100
100
101
101
# Overview of Operation # {#overview}
102
102
103
+ <!-- TODO: remove -->
104
+ <dfn>impression</dfn>
105
+ <dfn>conversion</dfn>
106
+
103
107
At impression time, information about an advertisement is saved by the browser in a write-only store.
104
108
This includes an identifier for the ad and some metadata about the ad,
105
109
such as whether the impression was an ad view or an ad click.
@@ -142,6 +146,10 @@ The aggregation service:
142
146
143
147
# API Details # {#api}
144
148
149
+ <!-- TODO fixup -->
150
+ A <dfn>conversion report</dfn> is generated when querying impressions.
151
+ Conversion reports are encrypted toward the identified aggregation service.
152
+
145
153
TODO
146
154
147
155
@@ -167,20 +175,278 @@ TODO
167
175
168
176
TODO
169
177
178
+ ## Anti-Replay Requirements ## {#anti-replay}
179
+
180
+ <!-- TODO link to definition of "conversion report" -->
181
+ Conversion reports generated by browsers are bound
182
+ to the amount of [=privacy budget=]
183
+ that was expended by the site that requested the report.
184
+
185
+ TODO
186
+
170
187
171
188
# Differential Privacy # {#dp}
172
189
173
- This design uses the concept of differential privacy as the basis of its privacy design.
190
+ This design uses the concept of [=differential privacy=]
191
+ as the basis of its privacy design. [[PPA-DP]]
192
+
193
+ <dfn>Differential privacy</dfn> is a mathematical definition of privacy
194
+ that can guarantee the amount of private information
195
+ that is revealed by a system. [[DP]]
196
+ Differential privacy is not the only means
197
+ by which privacy is protected in this system,
198
+ but it is the most rigorously defined and analyzed.
199
+ As such, it provides the strongest privacy guarantees.
200
+
201
+ Differential privacy uses randomized noise
202
+ to hide private data contributions
203
+ to an aggregated dataset.
204
+ The effect of noise is to hide
205
+ individual contributions to the dataset,
206
+ but to retain the usefulness of any aggregated analysis.
207
+
208
+ To apply differential privacy,
209
+ it is necessary to define what information is protected.
210
+ In this system, the protected information is
211
+ the [=impressions=] of a single user profile,
212
+ on a single user agent,
213
+ over a single week,
214
+ for a single website that registers [=conversions=] .
215
+ [[#dp-unit]] describes the implications of this design
216
+ in more detail.
217
+
218
+ This attribution design uses a form of differential privacy
219
+ called <dfn>individual differential privacy</dfn> .
220
+ In this model, user agents are each separately responsible
221
+ for ensuring that they limit the information
222
+ that is contributed.
223
+
224
+ The [=individual differential privacy=] design of this API
225
+ has three primary components:
226
+
227
+ 1. User agents limit the number of times
228
+ that they use [=impressions=] in [=conversion reports=] .
229
+ [[#dp-budget]] explores this in greater depth.
230
+
231
+ 2. [=Aggregation services=] ensure that any given [=conversion report=] is
232
+ only used in accordance with the [=privacy budget=] .
233
+ [[#anti-replay]] describes requirements on aggregation services
234
+ in more detail.
235
+
236
+ 3. Noise is added by [=aggregation services=] .
237
+ [[#dp-mechanism]] details the mechanisms that might be used.
238
+
239
+ Together, these measures place limits
240
+ on the information that is released for each [=privacy unit=] .
241
+
242
+
243
+ ## Privacy Unit ## {#dp-unit}
244
+
245
+ An implementation of differential privacy
246
+ requires a clear definition for what is protected.
247
+ This is known as the <dfn>privacy unit</dfn> ,
248
+ which represents the entity that receives privacy protection.
249
+
250
+ This system adopts a [=privacy unit=]
251
+ that is the combination of three values:
252
+
253
+ 1. A user agent profile.
254
+ That is, an instance of a user agent,
255
+ as used by a single person.
256
+
257
+ 2. The [=site=] that requests information about impressions.
258
+
259
+ <p class=note> The sites that register impressions
260
+ are not considered.
261
+ Those sites do not receive information from this system directly.
262
+
263
+ 3. The current week.
264
+
265
+ A change to any of these values produces a new privacy unit,
266
+ which results in a separate [=privacy budget=] .
267
+ Each site that a person visits receives a bounded amount of information
268
+ for each week.
269
+
270
+ Ideally, the [=privacy unit=] is a single person.
271
+ Though ideal, it is not possible to develop a useful system
272
+ that guarantees perfect correspondance with a person,
273
+ for a number of reasons:
274
+
275
+ * People use multiple browsers and multiple devices,
276
+ often without coordination.
277
+
278
+ * A unit that covered all websites
279
+ could be exhausted by one site,
280
+ denying other sites any information.
281
+
282
+ * Advertising is an ongoing activity.
283
+ Without renewing the [=privacy budget=] periodically,
284
+ sites could exhaust their budget forever.
285
+
286
+
287
+ ### Browser Instances ### {#dp-instance}
288
+
289
+ Each browser instance manages a separate [=privacy budget=] .
290
+
291
+ Coordination between browser instances might be possible,
292
+ but not expected.
293
+ That coordination might allow privacy to be improved
294
+ by reducing the total amount of information that is released.
295
+ It might also improve the utility of attribution
296
+ by allowing impressions on one browser instance
297
+ to be converted on another.
298
+
299
+ Coordination across different implementations
300
+ is presently out of scope for this work.
301
+ Implementations can perform some coordination
302
+ between instances that are known to be for the same person,
303
+ but this is not mandatory.
304
+
305
+
306
+ ### Per-Site Limits ### {#dp-site}
307
+
308
+ The information released to websites is done on the basis of [=site=] .
309
+ This aligns with the same boundary used in other privacy-relevant functions.
310
+
311
+ A finer privacy unit, such as an [=origin=] ,
312
+ would make it trivial to obtain additional information.
313
+ Information about the same person could be gathered
314
+ from multiple origins.
315
+ That information could then be combined
316
+ by exploiting the free flow of information within the site,
317
+ using cookies [[COOKIES]] or similar.
318
+
319
+ [[#dp-safety]] discusses attacks that exploit this limit
320
+ and some additional [=safety limits=] that might be implemented
321
+ by user agents
322
+ to protect against those attacks.
323
+
324
+
325
+ ### Refresh Interval ### {#dp-refresh}
326
+
327
+ The differential privacy budget available to a site
328
+ is refreshed at an interval of one week.
329
+
330
+ This budget applies to the [=impressions=]
331
+ that are registered with the user agent
332
+ and later queried,
333
+ not conversions.
334
+
335
+ From the perspective of the analysis [[PPA-DP]]
336
+ each week of impressions forms a separate database.
337
+ A finite number of queries can be made of each database,
338
+ as determined by the [=privacy budget=]
339
+ associated with that database.
340
+
341
+ The goal is to set a value that is as large as feasible.
342
+ A longer period of time allows for a better privacy/utility balance
343
+ because sites can be allocated a larger overall budget
344
+ at any point in time,
345
+ while keeping the overall rate of privacy loss low.
346
+ However, a longer interval means that it is easier to
347
+ exhaust a privacy budget completely,
348
+ yield no information until the next refresh.
349
+
350
+ The choice of a week is largely arbitrary.
351
+ One week is expected to be enough to allow sites
352
+ the ability to make decisions about how to spend [=privacy budgets=]
353
+ without careful planning that needs to account for
354
+ changes that might occur days or weeks in the future.
355
+
356
+ [[#dp-budget]] describes the process for budgeting in more detail.
174
357
175
358
176
359
## Privacy Budgets ## {#dp-budget}
177
360
178
361
Browsers maintain a <dfn>privacy budget</dfn> ,
179
362
which is a means of limiting the amount of privacy loss.
180
363
364
+ This specification uses an individual form
365
+ of (ε, δ)-differential privacy as its basis.
366
+ In this model, privacy loss is measured using the value ε.
367
+ The δ value is handled by the [=aggregation service=]
368
+ when adding noise to aggregates.
369
+
370
+ Each user agent instance is responsible for
371
+ managing privacy budgets.
372
+
373
+ Each [=conversion report=] that is requested specifies an ε value
374
+ that represents the amount of privacy budget
375
+ that the report consumes.
376
+
377
+ When searching for impressions for the conversion report,
378
+ the user agent deducts the specified ε value from
379
+ the budget for the week in which those impressions fall.
380
+ If the privacy budget for that week is not sufficient,
381
+ the impressions from that week are not used.
382
+
383
+ <div class=example id=ex-budget>
384
+ In the following figure,
385
+ impressions are recorded from a number of different sites,
386
+ shown with circles.
387
+
388
+ <figure>
389
+ <pre class=include-raw>
390
+ path:images/budget.svg
391
+ </pre>
392
+ <figcaption> An example of a store of impressions over time</figcaption>
393
+ </figure>
394
+
395
+ A [=conversion report=] might be requested at the time marked with "now".
396
+ That conversion report selects impressions marked with black circles,
397
+ corresponding to impressions from Site B, C, and E.
398
+
399
+ As a result, privacy budgets for the querying site is deducted
400
+ from weeks 1, 3, 4, and 5.
401
+ No impressions were recorded for week 2,
402
+ so no budget is deducted from that week.
403
+ </div>
404
+
405
+
181
406
TODO
182
407
183
408
409
+ ### Safety Limits ### {#dp-safety}
410
+
411
+ The basic [=privacy unit=] is vulnerable to attack
412
+ by an adversary that is able to correlate activity for the same person
413
+ across multiple [=sites=] .
414
+
415
+ Groups of sites can sometimes coordinate their activity,
416
+ such as when they have shared ownership or strong agreements.
417
+ A group of sites that can be sure that particular visitor is the same person--
418
+ using any means, including something like FedCM [[FEDCM]] --
419
+ can combine information gained from this API.
420
+
421
+ This can be used to increase the rate
422
+ at which a site gains information from attribution,
423
+ proportional to the number of sites
424
+ across which coordination occurs.
425
+ The default privacy unit places no limit on the information released
426
+ in this way.
427
+
428
+ To counteract this effect, user agents can implement <dfn>safety limits</dfn> ,
429
+ which are additional privacy budgets that do not consider site.
430
+ Safety limits might be significantly higher than per-site budgets,
431
+ so that they are not reached for most normal browsing activity.
432
+ The goal would be to ensure that they are only effective
433
+ for intensive activity or when being attacked.
434
+
435
+ Like the per-site privacy budget,
436
+ it is critical that sites be unable to determine
437
+ whether their request for a [=conversion report=] has caused
438
+ a safety limit to be exceeded.
439
+
440
+
441
+
442
+
443
+ ## Differential Privacy Mechanisms ## {#dp-mechanism}
444
+
445
+ The specific mechanisms that are used
446
+ depend on the type of [=aggregation service=] .
447
+
448
+
449
+
184
450
# Security # {#security}
185
451
186
452
TODO
@@ -193,8 +459,21 @@ The broad shape of this level of the API is based on an idea from Luke Winstrom.
193
459
The privacy architecture is courtesy of the authors of [[PPA-DP]] .
194
460
195
461
462
+ <pre class=link-defaults>
463
+ spec:html; type:dfn; text:site
464
+ </pre>
196
465
<pre class=biblio>
197
466
{
467
+ "dp": {
468
+ "authors": [
469
+ "Cynthia Dwork",
470
+ "Aaron Roth"
471
+ ],
472
+ "date": "2014",
473
+ "href": "https://doi.org/10.1561/0400000042",
474
+ "title": "The Algorithmic Foundations of Differential Privacy",
475
+ "publisher": "now, Foundations and Trends in Theoretical Computer Science, Vol. 9, Nos. 3–4"
476
+ },
198
477
"ppa-dp": {
199
478
"authors": [
200
479
"Pierre Tholoniat",
0 commit comments