Concatenate column from many bed files into single bed file

I have n bed files in the format:


chr1 0 10000 4 331
chr1 10000 20000 6 154
chr1 20000 30000 3 12

I would like to take column 4 (4, 6, 3) from each bed file and output as a single table file (csv/tsv/exact format doesn't matter), where columns 4 through 4+n are labelled the name of each bed file and contain column 4.

For example, take two bed files:

1.bed :

chr1 0 10000 4 331
chr1 10000 20000 6 154
chr1 20000 30000 3 12

2.bed :

chr1 0 10000 2 412
chr1 10000 20000 7 14
chr1 20000 30000 2 155

I would like the output to be:

chrom start end 1.bed 2.bed
chr1 0 10000 4 2
chr1 10000 20000 6 7
chr1 20000 30000 3 2

My current attempt has been to use bedops:

$ bedops --everything *.bed \
    | bedmap --echo-map - \
    | awk '(split($0, a, ";") == 3)' - \
    | sed 's/\;/\n/g' - \
    | sort-bed - \
    | uniq - \
    > answer.bed

However this produces the output:

Error: Unable to find file: 1.bed


  • Assumptions:

    • none of the input files have a header record
    • all input files have the same number of rows where ...
    • the first 3 columns are chrom, start and end and ...
    • there's at least one additional (4th) column
    • all input/ouput field delimiters are tabs
    • rows (from different input files) are joined based on the triple key of chrom + start + end
    • all input files have the same set of keys (ie, we don't have to worry about a key missing from some input files)
    • the input files are already sorted by key

    One awk idea:

    awk '
    BEGIN  { FS=OFS="\t"
             hdr = "chrom" OFS "start" OFS "end"
    FNR==1 { hdr = hdr OFS FILENAME }
           { key = $1 OFS $2 OFS $3
             lines[FNR] = (FNR==NR ? key : lines[FNR]) OFS $4
    END    { print hdr
             for (i=1;i<=FNR;i++)
                 print lines[i]
    ' *.bed


    • this single awk script replaces OP's current bedops | bedmap | awk | sed | sort-bed | uniq code
    • this assumes the *.bed files already exist and are not the output from bedops | bedmap

    This generates:

    chrom   start   end     1.bed   2.bed
    chr1    0       10000   4       2
    chr1    10000   20000   6       7
    chr1    20000   30000   3       2