Sometimes FASTQ data is aligned to a reference and stored as a BAM file, instead of the normal FASTQ read files. This is okay, because it is possible to recreate raw FASTQ files based on the BAM file. The following outlines this process. The useful software Show From each bam, we need to extract:
For #1, the following command will work. This was taken from this webpage.
The Resolving #2 is more complicated, as there are three ways a read might not have mapped as a proper pair. A. The first read mapped but the paired read did not. B. The first read did not map but the paired read did. C. Neither paired read mapped at all. Again, flags will be used to filter the original BAM file. This information was found at this webpage.
As you might expect, you have to then merge the three files that contain at least one unmapped pair.
Next, these BAM files must be resorted so that they are ordered by read ID instead of location in the reference.
At this time, it is a good idea to check that you have the correct number of reads and no redundancy. You can summarize the original BAM file to get an idea of where you started.
Notice the toal number of input reads that is found on the first line. You want to be sure that the number of unmapped and mapped reads total this number. It is easy to check using the following commands.
Note that one paired read is counted as two reads here. If you sum these two numbers, they should equal the number you noted above, as they do here. If all is good, you can now extract the FASTQ reads into two paired read files, as follows.
And then it also makes sense to combine both the first and paired reads together from the mapped and unmapped files.
These two files should now have the same number of reads that are exactly as you would have received them if they had come directly from the sequencer as FASTQ. Please also note that all of the commands above can be piped together in bash using How can I see how many reads in a BAM file?How to count the number of mapped reads in a BAM or SAM file?. samtools view -c SAMPLE.bam.. samtools view -c -F 260 SAMPLE.bam.. -f bitcode output reads that fulfill the checked 'bitcode' criteria, see SAM bitcode fields.. -F bitcode exclude reads that match one or more checked 'bitcode' criteria, see SAM bitcode fields.. What does BAM file contain?Binary Alignment Map (BAM) is the comprehensive raw data of genome sequencing; it consists of the lossless, compressed binary representation of the Sequence Alignment Map-files.
What does a BAM file show?BAM files show alignments. Use them for direct interpretation or as a starting point for further analysis with downstream analysis tools that are compatible with BAM. BAM files are suitable for viewing with an external viewer such as IGV or the UCSC Genome Browser.
How do I view a BAM file?BAM files can be opened from remote locations (ftp, http) and from local computers. For viewing BAM files, an index file must be found in the same directory as the BAM file. The index should be named by appending “. bai” to the BAM file name.
|