UTFaculteitenEEMCSDisciplines & departementenDMBAssignmentsOpen AssignmentsOpen Bachelor Assignments[D][B] Assess the effect of sample size rescaling in selective sweep detection

[D][B] Assess the effect of sample size rescaling in selective sweep detection

BACHELOR Assignment

Assess the effect of sample size rescaling in selective sweep detection

Type: Bachelor CS 

Period: TBD

Student: (Unassigned)

If you are interested please contact :

Description:

In population genetics, a selective sweep is the term used for a particular data pattern observed in DNA sequences when the organisms under investigation have been affected by positive selection. Detecting selective sweeps can explain why and how organisms survive in an environment and can also help in designing more effective drug treatments. Several research efforts in the past years focused on finding selective sweeps in the SARS-CoV-2 genome, which helped in designing vaccines. There are several statistical methods and readily available software implementations that can be used for detecting selective sweeps. The input is a text file with DNA data and the output is a series of scores that correspond to different locations in the genomes. Most existing methods become very slow when the number of DNA sequences increases, yielding state-of-the-art tools impractical for real biological analyses. This project will explore the possibility to rescale input data in such a way that the tools process a lower number of sequences without qualitatively affecting their performance.

Tasks involved and expected duration (% over total assignment time):

  1. Use a simulation software to generate DNA data (10%)
  2. Use a specific software to obtain a summarized view of the DNA data (5%)
  3. Develop a new software that rescales the summarized DNA data (40%)
  4. Create a script (processing pipeline) based the tools used in 1), 2), and the result of 3) (5%)
  5. Use the script to test several readily available sweep-detection software tools on the DNA data before and after rescaling (10%)
  6. Draw conclusions about the feasibility of the method and if/how the quality of the outcome is affected and produce a conference-paper-like report (30%)

Contact: