@@ -288,6 +288,14 @@ Generic callback code snippets
288
288
--refname-callback <function_body>::
289
289
Python code body for processing refnames; see <<CALLBACKS>>.
290
290
291
+ --file-info-callback <function_body>::
292
+ Python code body for processing the combination of filename, mode,
293
+ and associated file contents; see <<CALLBACKS>. Note that when
294
+ --file-info-callback is specified, any replacements specified by
295
+ --replace-text will not be automatically applied; instead, you
296
+ have control within the --file-info-callback to choose which files
297
+ to apply those transformations to.
298
+
291
299
--blob-callback <function_body>::
292
300
Python code body for processing blob objects; see <<CALLBACKS>>.
293
301
@@ -1164,8 +1172,9 @@ that you should be aware of before using them; see the "API BACKWARD
1164
1172
COMPATIBILITY CAVEAT" comment near the top of git-filter-repo source
1165
1173
code.
1166
1174
1167
- All callback functions are of the same general format. For a command line
1168
- argument like
1175
+ Most callback functions are of the same general format
1176
+ (--file-info-callback is an exception which will be noted later). For
1177
+ a command line argument like
1169
1178
1170
1179
--------------------------------------------------
1171
1180
--foo-callback 'BODY'
@@ -1209,6 +1218,7 @@ callbacks are:
1209
1218
--name-callback
1210
1219
--email-callback
1211
1220
--refname-callback
1221
+ --file-info-callback
1212
1222
--------------------------------------------------
1213
1223
1214
1224
in each you are expected to simply return a new value based on the one
@@ -1272,10 +1282,106 @@ git-filter-repo --filename-callback '
1272
1282
'
1273
1283
--------------------------------------------------
1274
1284
1275
- In contrast, the blob, reset, tag, and commit callbacks are not
1276
- expected to return a value, but are instead expected to modify the
1277
- object passed in. Major fields for these objects are (subject to API
1278
- backward compatibility caveats mentioned previously):
1285
+ The file-info callback is more involved. It is designed to be used in
1286
+ cases where filtering depends on both filename and contents (and maybe
1287
+ mode). It is called for file changes other than deletions (since
1288
+ deletions have no file contents to operate on). The file info
1289
+ callback takes four parameters (filename, mode, blob_id, and value),
1290
+ and expects three to be returned (filename, mode, blob_id). The
1291
+ filename is handled similar to the filename callback; it can be used
1292
+ to rename the file (or set to None to drop the change). The mode is a
1293
+ simple bytestring (b"100644" for regular non-executable files,
1294
+ b"100755" for executable files/scripts, b"120000" for symlinks, and
1295
+ b"160000" for submodules). The blob_id is most useful in conjunction
1296
+ with the value parameter. The value parameter is an instance of a
1297
+ class that has the following functions
1298
+ value.get_contents_by_identifier(blob_id) -> contents (bytestring)
1299
+ value.get_size_by_identifier(blob_id) -> size_of_blob (int)
1300
+ value.insert_file_with_contents(contents) -> blob_id
1301
+ value.is_binary(contents) -> bool
1302
+ value.apply_replace_text(contents) -> new_contents (bytestring)
1303
+ and has the following member data you can write to
1304
+ value.data (dict)
1305
+ These functions allow you to get the contents of the file, or its
1306
+ size, create a new file in the stream whose blob_id you can return,
1307
+ check whether some given contents are binary (using the heuristic from
1308
+ the grep(1) command), and apply the replacement rules from --replace-text
1309
+ (note that --file-info-callback makes the changes from --replace-text not
1310
+ auto-apply). You could use this for example to only apply the changes
1311
+ from --replace-text to certain file types and simultaneously rename the
1312
+ files it applies the changes to:
1313
+
1314
+ --------------------------------------------------
1315
+ git-filter-repo --file-info-callback '
1316
+ if not filename.endswith(b".config"):
1317
+ # Make no changes to the file; return as-is
1318
+ return (filename, mode, blob_id)
1319
+
1320
+ new_filename = filename[0:-7] + b".cfg"
1321
+
1322
+ contents = value.get_contents_by_identifier(blob_id)
1323
+ new_contents = value.apply_replace_text(contents)
1324
+ new_blob_id = value.insert_file_with_contents(new_contents)
1325
+
1326
+ return (new_filename, mode, new_blob_id)
1327
+ --------------------------------------------------
1328
+
1329
+ Note that if history has multiple revisions with the same file
1330
+ (e.g. it was cherry-picked to multiple branches or there were a number
1331
+ of reverts), then the --file-info-callback will be called multiple
1332
+ times. If you want to avoid processing the same file multiple times,
1333
+ then you can stash transformation results in the value.data dict.
1334
+ For, example, we could modify the above example to make it only apply
1335
+ transformations on blob_ids we have not seen before:
1336
+
1337
+ --------------------------------------------------
1338
+ git-filter-repo --file-info-callback '
1339
+ if not filename.endswith(b".config"):
1340
+ # Make no changes to the file; return as-is
1341
+ return (filename, mode, blob_id)
1342
+
1343
+ new_filename = filename[0:-7] + b".cfg"
1344
+
1345
+ if blob_id in value.data:
1346
+ return (new_filename, mode, value.data[blob_id])
1347
+
1348
+ contents = value.get_contents_by_identifier(blob_id)
1349
+ new_contents = value.apply_replace_text(contents)
1350
+ new_blob_id = value.insert_file_with_contents(new_contents)
1351
+ value.data[blob_id] = new_blob_id
1352
+
1353
+ return (new_filename, mode, new_blob_id)
1354
+ --------------------------------------------------
1355
+
1356
+ An alternative example for the --file-info-callback is to make all
1357
+ .sh files executable and add an extra trailing newline to the .sh
1358
+ files:
1359
+
1360
+ --------------------------------------------------
1361
+ git-filter-repo --file-info-callback '
1362
+ if not filename.endswith(b".sh"):
1363
+ # Make no changes to the file; return as-is
1364
+ return (filename, mode, blob_id)
1365
+
1366
+ # There are only 4 valid modes in git:
1367
+ # - 100644, for regular non-executable files
1368
+ # - 100755, for executable files/scripts
1369
+ # - 120000, for symlinks
1370
+ # - 160000, for submodules
1371
+ new_mode = b"100755"
1372
+
1373
+ contents = value.get_contents_by_identifier(blob_id)
1374
+ new_contents = contents + b"\n"
1375
+ new_blob_id = value.insert_file_with_contents(new_contents)
1376
+
1377
+ return (filename, new_mode, new_blob_id)
1378
+ --------------------------------------------------
1379
+
1380
+ In contrast to the previous callback types, the blob, reset, tag, and
1381
+ commit callbacks are not expected to return a value, but are instead
1382
+ expected to modify the object passed in. Major fields for these
1383
+ objects are (subject to API backward compatibility caveats mentioned
1384
+ previously):
1279
1385
1280
1386
* Blob: `original_id` (original hash) and `data`
1281
1387
* Reset: `ref` (name of reference) and `from_ref` (hash or integer mark)
0 commit comments