HowTos/CRAM2Fastq.md at master · buschlab/HowTos

CRAM to FASTQ conversion

Since the format only the difference between the reference and the aligned read, the reference genome is needed for the conversion to fastq. Usually, the information about the reference genome is requested from an EBA webservice during conversion. However, the webservice doesn't always respond and therefore produces empty fastq files. Hence, it is recommended to create a local reference database with SAMtools prio to conversion.

Workflow

Extract name of reference genome from cram header and download .fasta file
Create reference database with /path/to/samtools/misc/seq_cache_populate.pl
Set appropriate environment variables
Use SAMtools sort and fastq to create valid fastq files

Example script

#SBATCH --cluster=omics
#SBATCH --partition=shortterm,longterm
#SBATCH --no-requeue
#SBATCH --mem=180GB
#SBATCH -c 48
#SBATCH --time 4320


path_data=/data/lied_egypt_genome/ega_data_EGAD00001001372


ls *cram | sed 's/\_/ /g' | uniq | sort -u | awk -v path_data=$path_data '{print "samtools sort -n $SCRATCH/cram/"$1"_"$2"_"$3" | samtools fastq -1 $SCRATCH/fastq/"$1".R1.fq -2 $SCRATCH/fastq/"$1".R2.fq -; gzip $SCRATCH/fastq/"$1".R*.fq; mv $SCRATCH/fastq/"$1".R*.fq.gz "path_data"/fastq"}' >cram2fastq.sh;


export XDG_CACHE_HOME=$SCRATCH
cp -r cram $SCRATCH
cp -r hs37d5.fa $SCRATCH
mkdir $SCRATCH/fastq
cd $SCRATCH


seq_cache_populate.pl -root $SCRATCH/cache $SCRATCH/hs37d5.fa
export REF_PATH=$SCRATCH/cache/%2s/%2s/%s:http://www.ebi.ac.uk/ena/cram/md5/%s
export REF_CACHE=$SCRATCH/cache/%2s/%2s/%s

~/workspace/toolbox/parallelize_cmds.pl 48 1 $path_data/cram2fastq.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CRAM to FASTQ conversion

Workflow

Example script

FilesExpand file tree

CRAM2Fastq.md

Latest commit

History

CRAM2Fastq.md

File metadata and controls

CRAM to FASTQ conversion

Workflow

Example script