Skip to content

Data lineage tracking (aka CID store) #5715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 96 commits into from
Apr 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
472fcc7
Addressable data store
pditommaso Jan 26, 2025
4f8c524
Merge branch 'master' into cid-store
pditommaso Jan 31, 2025
b5e8c46
Addressable data store [wip 2] [ci skip]
pditommaso Jan 31, 2025
669afd5
Minor changes [ci skip]
pditommaso Jan 31, 2025
c93a713
Addressable data store
pditommaso Jan 26, 2025
c0c660f
Addressable data store [wip 2] [ci skip]
pditommaso Jan 31, 2025
2a2d76f
Minor changes [ci skip]
pditommaso Jan 31, 2025
a2139e3
M0 implementation
jorgee Feb 12, 2025
fddc5f7
fix tests
jorgee Feb 12, 2025
fe780a8
fix tests
jorgee Feb 12, 2025
f9f7ed2
first M1 updates
jorgee Feb 14, 2025
0c2492e
fix tests
jorgee Feb 14, 2025
41ac817
update descriptions
jorgee Feb 17, 2025
cdc3116
fix test
jorgee Feb 17, 2025
642e7b1
Merge branch 'master' into cid-store-m0
pditommaso Feb 19, 2025
1400e80
Merge branch 'cid-store' of github.com:nextflow-io/nextflow into cid-…
pditommaso Feb 19, 2025
d64c71a
Merge branch 'master' into cid-store
pditommaso Feb 19, 2025
975143f
Merge branch 'cid-store' into cid-store-m0
pditommaso Feb 19, 2025
82b1ccd
First commit to M1 implementation
jorgee Feb 27, 2025
edfaf5b
fix NPE in tests
jorgee Feb 27, 2025
c207d92
Fix NPE in tests
jorgee Feb 27, 2025
f4b9031
Add CidStore factory
jorgee Feb 27, 2025
b89cdf1
fix cid paht hash validation
jorgee Feb 27, 2025
54849d3
Merge branch 'master' into cid-store
jorgee Feb 28, 2025
fdbda27
Merge branch 'cid-store' into cid-store-m0
jorgee Feb 28, 2025
84ee951
Merge branch 'master' into cid-store-m0
pditommaso Mar 1, 2025
0d1a74f
Merge pull request #5787 from nextflow-io/cid-store-m0 [ci fast]
pditommaso Mar 1, 2025
34cc0b1
Cleanup and formatting
pditommaso Mar 1, 2025
bd96bc8
Decouple cid store from session (#5833) [ci fast]
pditommaso Mar 3, 2025
4c0ef8f
Add cid command help
jorgee Mar 4, 2025
b9de3e6
Fix CID store errors when workflow outputs in s3
jorgee Mar 4, 2025
bebe947
Merge branch 'master' into cid-store
pditommaso Mar 6, 2025
36d293f
Merge branch 'master' into cid-store
pditommaso Mar 7, 2025
76f3494
Merge branch 'master' into cid-store
jorgee Mar 11, 2025
063a0ec
Decouple CID FileSystem from Local file system and other fixes (#5866)
jorgee Mar 11, 2025
acabc9c
Merge branch 'master' into cid-store
pditommaso Mar 11, 2025
3e2ca19
Just blank [ci skip]
pditommaso Mar 11, 2025
90c8e38
fix unexpected warning in cidpath hash validation
jorgee Mar 12, 2025
5276645
Refactor CID store as plugin (#5877)
pditommaso Mar 12, 2025
be2fefc
Merge with master@00a53b97 [ci fast]
pditommaso Mar 15, 2025
9de3b04
Merge branch 'master' into cid-store
pditommaso Mar 15, 2025
16f74b4
Restore unneeded changes [ci fast]
pditommaso Mar 15, 2025
eb14d4b
Add CID H2 plugin (#5889)
pditommaso Mar 20, 2025
c04a58f
Merge branch 'master' into cid-store
pditommaso Mar 21, 2025
db79c43
Add serde interfaces (#5893) [ci skip]
pditommaso Mar 24, 2025
6b3293b
PoC for CID store annotations and workflow outputs structure (#5885)
jorgee Apr 2, 2025
a5b5907
Merge branch 'master' into cid-store
jorgee Apr 2, 2025
14e2841
Cid store quick wins (#5945)
jorgee Apr 5, 2025
7cba5bd
Merge branch 'master' into cid-store
pditommaso Apr 5, 2025
4073298
Just blanks [ci skip]
pditommaso Apr 5, 2025
1b991ec
Clean up
pditommaso Apr 5, 2025
d6e77a3
Minor [ci fast]
pditommaso Apr 5, 2025
4058c8e
Add checksum factory [ci fast]
pditommaso Apr 5, 2025
ffa3b7e
Fix typo [ci skip]
pditommaso Apr 5, 2025
9b4addf
Simplify code [ci fast]
pditommaso Apr 5, 2025
5692b67
Simplify cid workflow run scripts (#5949)
pditommaso Apr 7, 2025
5ef0b91
Merge branch 'master' into cid-store
pditommaso Apr 9, 2025
9c335f2
Merge branch 'master' into cid-store
pditommaso Apr 10, 2025
c9134e4
Cid store improve coverage, fixes and others (#5956)
jorgee Apr 11, 2025
f1a8770
Remove dot from message [ci fast]
pditommaso Apr 11, 2025
10f2b87
Fix test on macOs [ci fast]
pditommaso Apr 11, 2025
d415099
Merge branch 'master' into cid-store [ci fast]
pditommaso Apr 13, 2025
14ede8e
Update copyright [ci fast]
pditommaso Apr 13, 2025
802a119
Nit formatting [ci skip]
pditommaso Apr 13, 2025
10082ca
Fix failing tests [ci fast]
pditommaso Apr 13, 2025
7bff926
Refactor RealPathAware to LogicalPath [ci fast]
pditommaso Apr 13, 2025
a8765e6
Minor change [ci skip]
pditommaso Apr 13, 2025
3c93ee9
CID consolidation (#5969) [ci fast]
jorgee Apr 15, 2025
23279ac
Merge branch 'master' into cid-store
jorgee Apr 15, 2025
f7d84ba
fix merged publishop
jorgee Apr 15, 2025
1c06e69
Fix failing test on mac [ci fast]
pditommaso Apr 15, 2025
9f41c28
change getTargetPath with flags to different methods
jorgee Apr 15, 2025
f43e46c
change resolved config from string to map
jorgee Apr 15, 2025
5208c38
CID to lineage rename (#5977)
jorgee Apr 15, 2025
abf0c3a
fixes render, hint of closer property name
jorgee Apr 15, 2025
ac7f675
Just blanks [ci fast]
pditommaso Apr 15, 2025
4c9f8e0
Minor changes
pditommaso Apr 16, 2025
306fbaf
Fix failing tests [ci fast]
pditommaso Apr 16, 2025
226bf65
Add support for command aliases [ci fast]
pditommaso Apr 16, 2025
38735b0
cleanup code style
bentsherman Apr 17, 2025
6d71bc9
replace nested ifs with if-guards
bentsherman Apr 17, 2025
52e3333
change default render file name to `lineage.html`
bentsherman Apr 17, 2025
baed1ad
fix typo
bentsherman Apr 17, 2025
d3bc077
Merge branch 'master' into cid-store
bentsherman Apr 17, 2025
e7f437b
cleanup whitespace
bentsherman Apr 17, 2025
52664ce
don't wrap singleton file output as a list
bentsherman Apr 17, 2025
36d5c82
fix failing test
bentsherman Apr 17, 2025
10e7e7e
remove unnecesary method in taskId
jorgee Apr 22, 2025
119d2f8
renames and small fixes
jorgee Apr 22, 2025
034665c
fix tests
jorgee Apr 22, 2025
119b1d9
Minor changes [ci fast]
pditommaso Apr 22, 2025
403c2e5
Merge branch 'master' into cid-store
pditommaso Apr 22, 2025
d186742
[ci fast] merge master
pditommaso Apr 22, 2025
303807b
Remove h2 plugin [ci fast]
pditommaso Apr 22, 2025
83f5647
Simplified config [ci fast]
pditommaso Apr 23, 2025
854a9de
Fix failing tests [ci fast]
pditommaso Apr 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,7 @@ task compile {

def getRuntimeConfigs() {
def names = subprojects
.findAll { prj -> prj.name in ['nextflow','nf-commons','nf-httpfs','nf-lang'] }
.findAll { prj -> prj.name in ['nextflow','nf-commons','nf-httpfs','nf-lang','nf-lineage'] }
.collect { it.name }

FileCollection result = null
Expand All @@ -263,7 +263,7 @@ task exportClasspath {
def home = System.getProperty('user.home')
def all = getRuntimeConfigs()
def libs = all.collect { File file -> /*println file.canonicalPath.replace(home, '$HOME');*/ file.canonicalPath; }
['nextflow','nf-commons','nf-httpfs','nf-lang'].each {libs << file("modules/$it/build/libs/${it}-${version}.jar").canonicalPath }
['nextflow','nf-commons','nf-httpfs','nf-lang','nf-lineage'].each {libs << file("modules/$it/build/libs/${it}-${version}.jar").canonicalPath }
file('.launch.classpath').text = libs.unique().join(':')
}
}
Expand All @@ -276,7 +276,7 @@ ext.nexusEmail = project.findProperty('nexusEmail')
// `signing.keyId` property needs to be defined in the `gradle.properties` file
ext.enableSignArchives = project.findProperty('signing.keyId')

ext.coreProjects = projects( ':nextflow', ':nf-commons', ':nf-httpfs', ':nf-lang' )
ext.coreProjects = projects( ':nextflow', ':nf-commons', ':nf-httpfs', ':nf-lang', ':nf-lineage' )

configure(coreProjects) {
group = 'io.nextflow'
Expand Down
2 changes: 1 addition & 1 deletion modules/nextflow/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ dependencies {
api 'io.seqera:lib-trace:0.1.0'

testImplementation 'org.subethamail:subethasmtp:3.1.7'

testImplementation (project(':nf-lineage'))
// test configuration
testFixturesApi ("org.apache.groovy:groovy-test:4.0.26") { exclude group: 'org.apache.groovy' }
testFixturesApi ("org.objenesis:objenesis:3.4")
Expand Down
4 changes: 2 additions & 2 deletions modules/nextflow/src/main/groovy/nextflow/Session.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -1137,11 +1137,11 @@ class Session implements ISession {
}
}

void notifyFilePublish(Path destination, Path source=null) {
void notifyFilePublish(Path destination, Path source, Map annotations) {
def copy = new ArrayList<TraceObserver>(observers)
for( TraceObserver observer : copy ) {
try {
observer.onFilePublish(destination, source)
observer.onFilePublish(destination, source, annotations)
}
catch( Exception e ) {
log.error "Failed to invoke observer on file publish: $observer", e
Expand Down
291 changes: 291 additions & 0 deletions modules/nextflow/src/main/groovy/nextflow/cli/CmdLineage.groovy
Original file line number Diff line number Diff line change
@@ -0,0 +1,291 @@
/*
* Copyright 2013-2025, Seqera Labs
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/

package nextflow.cli

import java.nio.file.Paths

import com.beust.jcommander.Parameter
import com.beust.jcommander.Parameters
import groovy.transform.CompileStatic
import nextflow.config.ConfigBuilder
import nextflow.config.ConfigMap
import nextflow.exception.AbortOperationException
import nextflow.plugin.Plugins
import org.pf4j.ExtensionPoint

/**
* CID command line interface
*
* @author Paolo Di Tommaso <[email protected]>
*/
@CompileStatic
@Parameters(commandDescription = "Explore workflows lineage metadata", commandNames = ['li'])
class CmdLineage extends CmdBase implements UsageAware {

private static final String NAME = 'lineage'

interface LinCommand extends ExtensionPoint {
void log(ConfigMap config)
void describe(ConfigMap config, List<String> args)
void render(ConfigMap config, List<String> args)
void diff(ConfigMap config, List<String> args)
void find(ConfigMap config, List<String> args)
}

interface SubCmd {
String getName()
String getDescription()
void apply(List<String> args)
void usage()
}

private List<SubCmd> commands = new ArrayList<>()

private LinCommand operation

private ConfigMap config

CmdLineage() {
commands << new CmdLog()
commands << new CmdDescribe()
commands << new CmdRender()
commands << new CmdDiff()
commands << new CmdFind()
}

@Parameter(hidden = true)
List<String> args

@Override
String getName() {
return NAME
}

@Override
void run() {
if( !args ) {
usage(List.of())
return
}
// setup the plugins system and load the secrets provider
Plugins.init()
// load the config
this.config = new ConfigBuilder()
.setOptions(launcher.options)
.setBaseDir(Paths.get('.'))
.build()
// init plugins
Plugins.load(config)
// load the command operations
this.operation = Plugins.getExtension(LinCommand)
if( !operation )
throw new IllegalStateException("Unable to load lineage extensions.")
// consume the first argument
getCmd(args).apply(args.drop(1))
}

/**
* Print the command usage help
*/
void usage() {
usage(args)
}

/**
* Print the command usage help
*
* @param args The arguments as entered by the user
*/
void usage(List<String> args) {
if( !args ) {
List<String> result = []
result << this.getClass().getAnnotation(Parameters).commandDescription()
result << "Usage: nextflow $NAME <sub-command> [options]".toString()
result << ''
result << 'Commands:'
int len = 0
commands.forEach {len = it.name.size() > len ? it.name.size() : len }
commands.sort(){it.name}.each { result << " ${it.name.padRight(len)}\t${it.description}".toString() }
result << ''
println result.join('\n').toString()
}
else {
def sub = commands.find { it.name == args[0] }
if( sub )
sub.usage()
else {
throw new AbortOperationException("Unknown $NAME sub-command: ${args[0]}")
}
}
}

protected SubCmd getCmd(List<String> args) {

def cmd = commands.find { it.name == args[0] }
if( cmd ) {
return cmd
}

def matches = commands.collect{ it.name }.closest(args[0])
def msg = "Unknown cloud sub-command: ${args[0]}"
if( matches )
msg += " -- Did you mean one of these?\n" + matches.collect { " $it"}.join('\n')
throw new AbortOperationException(msg)
}

class CmdLog implements SubCmd {

@Override
String getName() {
return 'list'
}

@Override
String getDescription() {
return 'List the executions with lineage enabled'
}

@Override
void apply(List<String> args) {
if (args.size() != 0) {
println("ERROR: Incorrect number of parameters")
usage()
return
}
operation.log(config)
}

@Override
void usage() {
println description
println "Usage: nextflow $NAME $name"
}
}

class CmdDescribe implements SubCmd{

@Override
String getName() {
return 'view'
}

@Override
String getDescription() {
return 'Print the description of a Lineage ID (lid)'
}

void apply(List<String> args) {
if (args.size() != 1) {
println("ERROR: Incorrect number of parameters")
usage()
return
}

operation.describe(config, args)
}

@Override
void usage() {
println description
println "Usage: nextflow $NAME $name <lid> "
}
}

class CmdRender implements SubCmd {

@Override
String getName() { 'render' }

@Override
String getDescription() {
return 'Render the lineage graph for a workflow output'
}

void apply(List<String> args) {
if (args.size() < 1 || args.size() > 2) {
println("ERROR: Incorrect number of parameters")
usage()
return
}

operation.render(config, args)
}

@Override
void usage() {
println description
println "Usage: nextflow $NAME $name <workflow output lid> [<html output file>]"
}

}

class CmdDiff implements SubCmd {

@Override
String getName() { 'diff' }

@Override
String getDescription() {
return 'Show differences between two lineage descriptions'
}

void apply(List<String> args) {
if (args.size() != 2) {
println("ERROR: Incorrect number of parameters")
usage()
return
}
operation.diff(config, args)
}

@Override
void usage() {
println description
println "Usage: nextflow $NAME $name <lid-1> <lid-2>"
}

}

class CmdFind implements SubCmd {

@Override
String getName() { 'find' }

@Override
String getDescription() {
return 'Find lineage metadata descriptions matching with a query'
}

void apply(List<String> args) {
if (args.size() != 1) {
println("ERROR: Incorrect number of parameters")
usage()
return
}
operation.find(config, args)
}

@Override
void usage() {
println description
println "Usage: nextflow $NAME $name <query>"
}

}

}
3 changes: 2 additions & 1 deletion modules/nextflow/src/main/groovy/nextflow/cli/CmdRun.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,8 @@ class CmdRun extends CmdBase implements HubOptions {
runner.session.disableJobsCancellation = getDisableJobsCancellation()

final isTowerEnabled = config.navigate('tower.enabled') as Boolean
if( isTowerEnabled || log.isTraceEnabled() )
final isDataEnabled = config.navigate("lineage.enabled") as Boolean
if( isTowerEnabled || isDataEnabled || log.isTraceEnabled() )
runner.session.resolvedConfig = ConfigBuilder.resolveConfig(scriptFile.parent, this)
// note config files are collected during the build process
// this line should be after `ConfigBuilder#build`
Expand Down
14 changes: 11 additions & 3 deletions modules/nextflow/src/main/groovy/nextflow/cli/Launcher.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,8 @@ class Launcher {
new CmdSelfUpdate(),
new CmdPlugin(),
new CmdInspect(),
new CmdLint()
new CmdLint(),
new CmdLineage()
]

if(SecretsLoader.isEnabled())
Expand All @@ -120,13 +121,20 @@ class Launcher {

options = new CliOptions()
jcommander = new JCommander(options)
allCommands.each { cmd ->
for( CmdBase cmd : allCommands ) {
cmd.launcher = this;
jcommander.addCommand(cmd.name, cmd)
jcommander.addCommand(cmd.name, cmd, aliases(cmd))
}
jcommander.setProgramName( APP_NAME )
}

private static final String[] EMPTY = new String[0]

private static String[] aliases(CmdBase cmd) {
final aliases = cmd.getClass().getAnnotation(Parameters)?.commandNames()
return aliases ?: EMPTY
}

/**
* Create the Jcommander 'interpreter' and parse the command line arguments
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ class MermaidHtmlRenderer implements DagRenderer {
file.text = template.replace('REPLACE_WITH_NETWORK_DATA', network)
}

private String readTemplate() {
static String readTemplate() {
final writer = new StringWriter()
final res = MermaidHtmlRenderer.class.getResourceAsStream('mermaid.dag.template.html')
int ch
Expand Down
Loading