Determining your blood type from CRAM/CRAI gene sequencing data

Tested with nebula sequencing data.

Credit: Most part of this article came from reddit author Floedekartofler, link to original post. It's so helpful so I'm re-posting here.

This process is a bit involved since it's manual. I thought there would be open source programs available, but for some reason researchers in this field don't want to share. So instead I did it by hand. The advantage is that you don't need any command line tools. All you need is some high school biology knowledge.

Step 1: Look up the ABO gene in a genome browser

You can use the Nebula genome browser, but in case you like me didn't want to pay for the subscription you can also use the same genome browser at this address (https://igv.org/app/). Simply click tracks > local file and select the CRAM file as well as the CRAM index (CRAI). Next navigate to the ABO gene. The gene is located on chromosome 9 at position 133,250,401-133,275,201. Enter chr9:133,250,401-133,275,201 in the textbox next to the search bar. You should then see the ABO gene marked in blue on the refseq track and above it you'll see all your Nebula reads in the area and your genotype. If you click the cog next to the track, you can select the option to see all bases in your reads.

The ABO gene is reversed compared to the reference genome. Click the cog next to the top line of letters and select reverse, to see the reverse sequence instead. Make sure to also click the "highlight cursor" button at the top, so it will tell you which position you are hovering. Also turn on center line, so you can see which base you navigated to.

Recolic note: grey means your gene matched reference gene, colorful means yours doesn't match ref; You can choose reference data in menu > Genome, most provider use GRCh38/hg38 as ref, but old data might use GRCh37/hg19.

screenshot 1

Turn on "cursor guide", "center line", and "three frame translate".

Type O or not type O

First navigate to position chr9:133,257,521-133,257,521. Most people with blood type O have a mutation here. The reference genome is blood type O, so if you have a G at position 133,257,523, a T at 133,257,522 and an A 133,257,521 you are type O.

If you on some of your reads see an I, it means that you have an extra base here compared to the reference genome. If you click the I it will tell you that this base is a C. This means that you are not type O.

If all of your reads have an I, you have two non-O alleles. If roughly half of them have an I you have one O and one non-O allele. If none of your reads have the I, you are homozygous for the deletion and thus blood type O.

There are other mutations that can give a type O blood type, but this one is the most common.

screenshot 2

I am heterozygous for the deletion. This means I have one copy with the deletion (an O allele) and one copy without (a non-O allele). I can see I have a non O allele because I have some reads with an insertion. If I click the I it tells me the read was a C, meaning the gene had a G there. I also have some reads without the I with the sequence CAT. This is the deleted version.

Type A or B

If you were not type O you will need to figure out if you are type A or B. This is a bit more complex. Have a look at this paper https://pubmed.ncbi.nlm.nih.gov/12014997/. It has a nice figure that shows which variants correspond to which blood types. Here is an overview of which positions in the gene (text above letters on figure) correspond to which positions in the genome.

I recommend that you go through these positions and note your genotype. I just drew on top of the PDF. If I had a certain mutation I circled it and if I did not I crossed it. At the end my blood type was pretty easy to deduct. Remember that the gene is on the reverse strand. If you enabled reverse view (as mentioned earlier) the top line in the genome browser will have the correct base. You can verify the position by looking at the three bases in the figure. Remember to read from right to left in the genome.

However, the reads are not reversed, so when you look at the reads to determine genotype (especially important if you are heterozygous) remember to turn C into G, G into C, A into T and T into A. Also keep in mind that you have two alleles. So your genotype is the sum of two things from the table

screenshot 3

Look the yellow highlighted letters up in my table. Then enter chr9:pos-pos with pos being the position from the table. See if you have the mutations described.

1:    133,275,189  
53:   133,262,144  
106:  133,261,367  
188:  133,259,834  
189:  133,259,833  
190:  133,259,832  
220:  133,258,116  
261:  133,257,521. This is the type O mutation discussed earlier.  
297:  133,257,486  
318:  133,257,465  
351:  133,257,432  
454:  133,256,277  
467:  133,256,264  
498:  133,256,233  
526:  133,256,205  
529:  133,256,202  
538:  133,256,193  
542:  133,256,189  
564:  133,256,167  
579:  133,256,152  
595:  133,256,136  
641:  133,256,090  
646:  133,256,085  
657:  133,256,074  
669:  133,256,062  
681:  133,256,050  
700:  133,256,031  
703:  133,256,028  
721:  133,256,010  
729:  133,256,002  
768:  133,255,963  
771:  133,255,960  
796:  133,255,935  
802:  133,255,929  
803:  133,255,928  
829:  133,255,902  
871:  133,255,860  
893:  133,255,838  
926:  133,255,805  
927:  133,255,804  
930:  133,255,801  
1009: 133,255,722  
1054: 133,255,677  
1059: 133,255,672  
1061: 133,255,670

Recolic Appendix: RhD pos or neg

Additionally, I tried to learn my RhD pos/neg type but didn't found much useful info, so I asked GPT. It seems legit, works perfectly for me, so I'm sharing here for ref.

Disclaimer: This part is AI generated. It could be inaccurate!

| SNP         | GRCh38 coordinate   | REF base | Rh+ typical | Rh– typical |
|-------------|---------------------|----------|-------------|-------------|
| rs7853989   | chr1:25,694,681     | A        | A/A         | G/G         |
| rs8176722   | chr1:25,667,747     | C        | C/C         | T/T         |
| rs8176746   | chr1:25,681,029     | T        | T/T         | C/C         |
| rs590787    | chr1:25,688,453     | G        | G/G         | A/A         |
| RHD gene    | chr1:25,570,000-25,690,000 | Exists | Exists | Deleted     |