I'm Logan who study Bioinformatics.
I have analyzed ChIP-seq data by conducting peak calling with MACS3 and annotating the peaks using HOMER. The output provides coordinates for each peak, such as those for the region CM025010.1 (Start: 40194025, End: 40194493) mentioned below. Now, my objective is to extract sequences within a ±1kb window around each peak position and subsequently generate a bed file. I would appreciate any ideas or suggestions you may have. Thank you.
PeakID (cmd=annotatePeaks.pl refGenome.fa) Chr Start End Strand Peak Score Focus Ratio/Region Size Annotation Detailed
peakCalling_q_0.01_peak_16 CM025010.1 40194025 40194493 + 311 NA NA NA NA NA
peakCalling_q_0.01_peak_50 CM025021.1 12981866 12982368 + 279 NA NA NA NA NA
peakCalling_q_0.01_peak_27 CM025012.1 6509890 6510225 + 266 NA NA NA NA NA
peakCalling_q_0.01_peak_45 CM025021.1 12517853 12518147 + 246 NA NA NA NA NA
import pandas as pd
df = pd.read_csv('original_peaks.bed', sep='\t', header=None, names=['your', 'headers', 'here', 'a', 'b', 'etc'])
window_size = 1000
df['Start'] = df['Start'] - window_size
df['End'] = df['End'] + window_size
#### Clip the extended regions to ensure they don't fall below 0
df['start'] = df['start'].clip(lower=0)
#### Save the extended regions to a new BED file
df.to_csv('extended_peaks.bed', sep='\t', header=False, index=False)