I've created a tuple from the output of a channel.
ch_groups = INPUT_CHECK_GEX.out.group_samplesheet
.splitCsv( header:true, sep:',', strip:true )
.map { row ->
def keyID = row["keyid"]
def sampleID = row["sampleid"]
return [keyID, sampleID]
}
.groupTuple()
ch_groups.view()
This is the output
[group1-group2, [sample1, sample2, sample3, sample4]]
I have another output set up as a tuple as well: SEURAT_SINGLE.out.rds.view()
[sample3, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/b1/92baee56b862a2187f1459e1e66a4d/sample3_seurat_object.rds]
[sample7, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/37/6df9873421a81170aa8156c303bb3c/sample7_seurat_object.rds]
[sample6, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/7a/ebe2243cd6dbc81c2374be9e80c24b/sample6_seurat_object.rds]
[sample1, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/65/888f0fb28a20fe1c034e8da8666eee/sample1_seurat_object.rds]
[sample5, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/78/a0ce478d03da5fb4f67b34fcd194e4/sample5_seurat_object.rds]
[sample2, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/ec/98b2b1e045db5b0664233052e28e37/sample2_seurat_object.rds]
[sample4, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/44/5c38598986b3a48e05a4bcb5c72c73/sample4_seurat_object.rds]
I need to get a list of all the RDS files associated with each of the first outputs. For example, for [group1-group2, [sample1, sample2, sample3, sample4]]
I need a list of :
/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/65/888f0fb28a20fe1c034e8da8666eee/sample1_seurat_object.rds /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/ec/98b2b1e045db5b0664233052e28e37/sample2_seurat_object.rds/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/b1/92baee56b862a2187f1459e1e66a4d/sample3_seurat_object.rds]
/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/ec/98b2b1e045db5b0664233052e28e37/sample3_seurat_object.rds]
/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/44/5c38598986b3a48e05a4bcb5c72c73/sample4_seurat_object.rds]
Using his approach I was able to get the desired result for one contrast. As soon as I added contrasts, the output still provided only the first result.
For example, adding additional contrasts to INPUT_CHECK_GEX.out.group_samplesheet
:
ch_groups = INPUT_CHECK_GEX.out.group_samplesheet
.splitCsv( header:true, sep:',', strip:true )
.map { row ->
def keyID = row["keyid"]
def sampleID = row["sampleid"]
return [keyID, sampleID]
}
.groupTuple()
ch_groups.view()
ch_groups.view()
[group1-group2, [sample1, sample2, sample3, sample4]]
[group1-group2-group3, [sample1, sample2, sample3, sample4, sample5, sample6]]
And then running his suggestion, still gives the output, ignoring the added contrast:
[group1-group2, [/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/a6/02a8bc99a1a0ea3549d774145facbe/sample3_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/bf/2f9f884fe8868ee91ce077d598bd5d/sample4_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/1f/a18fc5718d3a7869da2340149254e3/sample2_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/8e/99e42901219cd3eba0981987033145/sample1_seurat_object.rds]]
I attempted to fix this with this solution, but while it brings in the second contrast, it doesn't map duplicate samples (IE sample1 is in BOTH contrasts):
INPUT_CHECK_GEX.out.group_samplesheet
.splitCsv( header:true, sep:',', strip:true )
.map { row ->
def key = row["keyid"]
def sample = row["sampleid"]
tuple( key, sample )
}
.map { key, sample -> tuple( sample, key ) }
.join( SEURAT_SINGLE.out.rds )
.map { sample, key, rds_file -> tuple( key, rds_file ) }
.groupTuple()
.view()
Output:
[group1-group2-group3, [/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/4c/747cbe34e3464a22c376d09be2cdb1/sample6_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/51/9bb8aad780fd14e9ed7ad9b3f3b06f/sample5_seurat_object.rds]
[group1-group2, [/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/d8/b02c8c3ab57faefe4bb60e85b03743/sample3_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/27/eb43d9f44534819f289831869270a8/sample1_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/e2/2811ac1360970134456f34b7d55518/sample4_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/1f/a18fc5718d3a7869da2340149254e3/sample2_seurat_object.rds]]
Expected Output:
[group1-group2, [/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/d8/b02c8c3ab57faefe4bb60e85b03743/sample3_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/27/eb43d9f44534819f289831869270a8/sample1_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/e2/2811ac1360970134456f34b7d55518/sample4_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/1f/a18fc5718d3a7869da2340149254e3/sample2_seurat_object.rds]]
[group1-group2-group3, [/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/4c/747cbe34e3464a22c376d09be2cdb1/sample6_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/51/9bb8aad780fd14e9ed7ad9b3f3b06f/sample5_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/d8/b02c8c3ab57faefe4bb60e85b03743/sample3_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/27/eb43d9f44534819f289831869270a8/sample1_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/e2/2811ac1360970134456f34b7d55518/sample4_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/1f/a18fc5718d3a7869da2340149254e3/sample2_seurat_object.rds]
For anyone else who finds themselves with this question, this was the solution I came up with:
ch_groups = INPUT_CHECK_GEX.out.group_samplesheet
.splitCsv( header:true, sep:',', strip:true )
.map { row ->
def key = row["keyid"]
def sample = row["sampleid"]
return [sample, key]
}
.combine(SEURAT_SINGLE.out.rds, by: 0)
.map { sample, key, rds_file -> tuple( key, rds_file ) }
.groupTuple()
.view()
Gives the output:
[group1-group2, [/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/1f/a18fc5718d3a7869da2340149254e3/sample2_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/a6/02a8bc99a1a0ea3549d774145facbe/sample3_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/78/e7d26a4328f99d5984cdb1acd8e4b0/sample1_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/da/ca761f3d5b389f1333736ec5ae1dfe/sample4_seurat_object.rds]]
[group1-group2-group3, [/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/1f/a18fc5718d3a7869da2340149254e3/sample2_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/a6/02a8bc99a1a0ea3549d774145facbe/sample3_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/98/1063e9c6b025e59238d84db688ece5/sample5_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/ec/c1924829b9e4298540c530aa37e919/sample6_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/78/e7d26a4328f99d5984cdb1acd8e4b0/sample1_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/da/ca761f3d5b389f1333736ec5ae1dfe/sample4_seurat_object.rds]]
Assuming your group samplesheet contains multiple groups each with a different number of samples, you could use a groupKey
object to associate the number of samples with each group. This approach lets the groupTuple
operator then stream the collected values as soon as possible. For example:
workflow {
INPUT_CHECK_GEX.out.group_samplesheet
.splitCsv( header:true, sep:',', strip:true )
.map { row ->
def keyID = row["keyid"]
def sampleID = row["sampleid"]
tuple( keyID, sampleID )
}
.groupTuple()
.map { group, samples ->
tuple( groupKey(group, samples.size()), samples )
}
.set { groups_ch }
groups_ch
.transpose()
.map { key, sample -> tuple( sample, key ) }
.join( SEURAT_SINGLE.out.rds )
.map { sample, key, rds_file -> tuple( key, rds_file ) }
.groupTuple()
.view()
}
Expected results:
[group1-group2, [/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/65/888f0fb28a20fe1c034e8da8666eee/sample1_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/ec/98b2b1e045db5b0664233052e28e37/sample2_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/b1/92baee56b862a2187f1459e1e66a4d/sample3_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/44/5c38598986b3a48e05a4bcb5c72c73/sample4_seurat_object.rds]]
Note that if a sample can belong to one or more groups, simply replace the join
with the combine
operator. Just make sure to use the second form which allows you to combine items that share a common matching key using the by
parameter, for example:
groups_ch
.transpose()
.map { key, sample -> tuple( sample, key ) }
.combine( SEURAT_SINGLE.out.rds, by: 0 )
.map { sample, key, rds_file -> tuple( key, rds_file ) }
.groupTuple()
.view()
Expected results:
[group1-group2, [/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/65/888f0fb28a20fe1c034e8da8666eee/sample1_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/ec/98b2b1e045db5b0664233052e28e37/sample2_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/b1/92baee56b862a2187f1459e1e66a4d/sample3_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/44/5c38598986b3a48e05a4bcb5c72c73/sample4_seurat_object.rds]]
[group1-group2-group3, [/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/65/888f0fb28a20fe1c034e8da8666eee/sample1_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/ec/98b2b1e045db5b0664233052e28e37/sample2_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/b1/92baee56b862a2187f1459e1e66a4d/sample3_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/44/5c38598986b3a48e05a4bcb5c72c73/sample4_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/78/a0ce478d03da5fb4f67b34fcd194e4/sample5_seurat_object.rds, /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/TechDev_scRNASeq_Dev2023/work/7a/ebe2243cd6dbc81c2374be9e80c24b/sample6_seurat_object.rds]]