Finding the Closest Variants to Specific Genomic Locations

Finding the Closest Variants to Specific Genomic Locations

In the field of genomics, we often need to find the closest variants (e.g., SNPs, indels) to a set of genomic locations of interest. This task can be accomplished using various bioinformatics tools such as bedtools. In this blog post, we will walk through a step-by-step guide on how to achieve this.

Prerequisites

Before we start, make sure you have the following files:

  1. A BED file with your locations of interest. In this example, we’ll use locations_of_interest.bed
  2. A VCF file with your variants. In this example, we’ll use FinalSetVariants_referenceGenome.vcf

Step 1: Sorting the VCF File

The first issue we encountered was that the VCF file was not sorted lexicographically. bedtools requires the input files to be sorted in this manner. We can sort the VCF file using the following command:

(grep '^#' FinalSetVariants_referenceGenome.vcf; grep -v '^#' FinalSetVariants_referenceGenome.vcf | sort -k1,1 -k2,2n) > sorted_FinalSetVariants_referenceGenome.vcf

This command separates the header lines (those starting with #) from the data lines, sorts the data lines, and then concatenates the header and sorted data into a new file sorted_FinalSetVariants_referenceGenome.vcf.

Step 2: Converting VCF to BED and Finding the Closest Variants

The next step is to find the closest variants to our locations of interest. However, by default, bedtools closest outputs the entire VCF entry, which might be more information than we need. To limit the output, we can convert the VCF file to a BED format on-the-fly and assign an additional feature, the marker name, as chr_bpLocation (which is the convention we use for naming our markers). We can also add the -d option to get the distance between the location of interest and the closest variant. Here is the command:

awk 'BEGIN {OFS="\t"} {if (!/^#/) {print $1,$2-1,$2,$4"/"$5,"+",$1"_"$2}}' sorted_FinalSetVariants_referenceGenome.vcf | bedtools closest -a locations_of_interest.bed -b stdin -d

This command uses awk to read the VCF data, convert it to BED format, and write the result to the standard output. The pipe (|) then feeds this output directly into bedtools closest as the -b file. The keyword stdin is used to tell bedtools to read from the standard input.

Conclusion

With these two steps, we can efficiently find the closest variants to a set of genomic locations of interest. This approach is flexible and can be adapted to different datasets and requirements.

New research published

New research published

Have you ever wondered how the cells in the cow’s udder react to bacterial infection? In our latest study, we used a novel method to investigate the gene expression of primary bovine mammary epithelial cells (pbMECs) after challenge with Escherichia coli, a common cause of mastitis in dairy cattle. Mastitis is a costly and painful disease that affects both animal welfare and milk production. By understanding the molecular mechanisms of the immune response, we hope to find new ways to prevent and treat mastitis, and to breed cows that are more resistant to infection.

We extracted pbMECs from fresh milk samples and cultured them in the laboratory. Then we exposed them to heat-inactivated E. coli and measured the changes in gene expression at 3 and 24 hours post-challenge using RNA sequencing. We found that E. coli challenge induced a strong inflammatory response in the pbMECs, involving genes and pathways related to cytokine signaling, toll-like receptor signaling, and antigen presentation. We also identified some novel candidate genes that have not been previously reported to be associated with mastitis in cattle.

Our study demonstrates that milk-derived pbMECs can be used as a non-invasive, in vitro model to study mastitis resistance in cattle. This approach has several advantages over traditional methods, such as avoiding animal suffering, reducing costs, and enabling high-throughput screening. Our results provide new insights into the molecular basis of the udder immune response and suggest potential targets for future interventions. To learn more, you can read our full paper here: Iso-Touru et al. Veterinary Research (2024) 55:13.

Independence day

Independence day

Today is Finland’s independence day and I have to say, there was barely ever a nicer winter day on the 6th of December since I’m here. Just beautiful, “Hyvää itsenäisyyspäivää, Suomi!”