Zum Hauptinhalt springen

An Undergraduate Genome Research Course Using 'Big Data'

Bennett, Evan N. ; Page, Shallee T.
In: Biochemistry and Molecular Biology Education, Jg. 50 (2022), Heft 5, S. 450-452
Online academicJournal

An undergraduate genome research course using "big data" 

The ability to analyze large data sets ("Big Data") is an increasingly important skill in modern science. In Biochemistry, the increased volume and velocity of data is particularly evident in the rapid expansion of biological databases. We present a modular bioinformatics course to survey the analysis of genomic data for advanced undergraduates. Research activities include genome scanning for endogenous retroviruses, annotating genomic sequences and a brief exploration of programming in R. A summative poster session was used to disseminate their work. This course is amenable to remote or online instruction. Supplemental materials provided include a schedule and outline. This article reports a session from the virtual international 2021 IUBMB/ASBMB workshop, "Teaching Science on Big Data."

Keywords: course‐based (CUREs); bioinformatics; curriculum design development and implementation; genome research; undergraduate research; undergraduate researchbig data

Recently, ASBMB offered a virtual conference called "Teaching Science with Big Data," emphasizing the importance of incorporating manipulation of large datasets into curricula.1 As of 2021, the NCBI database had accumulated over 2.3 × 108 nucleotide and protein sequences.2 But, concomitant with the generation of such vast amounts of data comes a bottleneck: the ability to analyze these data. With over half a million STEM graduates every year, undergraduates can serve as a resource for addressing the bottleneck.3 In addition, the ability to manipulate and analyze big data is an important skill for training scientists and health practitioners.4 Thus, training undergraduate researchers in bioinformatics is a way to address workforce demands while providing an introduction to critical tools for future biologists.

Several papers have explored methods for teaching bioinformatics to undergraduates5 including several from this journal.6–10 CUREs have been shown to increase learning,11 increase persistence in science majors,12,13 as well as having positive impacts on appreciation for science and career aspirations.14

At our small primarily undergraduate institution, we established a one‐semester course that utilized modular units so that students could gain basic hands‐on skills with database searches, genome annotation, programming on the BASH command line and in R. Three of these modules are based on consortia that support bioinformatics: HHMI SEA‐PHAGES, GEP, and UNH T3 (see Acknowledgements and Supplemental Materials). Content was provided on a need‐to‐know basis as introduction, so that classes could be primarily conducted as exploratory workshops. Our goal was to provide students with broad exposure to bioinformatics and enable them to analyze big data sets of viruses, flies, and humans. Pedagogical advantages include problem‐based learning, visualization of large data sets and use of bioinformatics tools. The course design is amenable to online/remote instruction.

The workflow is shown in Figure 1. For unit 1, students scanned a cross‐species database for potentially oncogenic endogenous retroviruses (ERVs) bearing sequence similarity to known human ERVs (Figure S1). For unit 2, students have submitted fully annotated phage genomes to GenBank.15 For unit 3, students have fully annotated Drosophila contigs. For unit 4, students have assembled microbial sequencing reads and tentatively assigned a genus. For unit 5, the students learn about the programming language R16,17 (available for free download18) and students can observe visualization of pre‐published data sets in R (Figure S2). Deliverables generated include: novel endogenous retroviral relationships, genomes assembled and annotated. Student progress was monitored weekly using shared electronic notebooks.

bmb21647-fig-0001.jpg

We have encountered challenges in implementation. For example, there has been significant variation in student preparation for the course. In addition, there is always a trade‐off between breadth and depth of content. We typically go into more depth with annotation and BASH, but the R programming is a quick overview (Supplemental Materials).

Student feedback has been largely positive. For the 2021 cohort, 100% of the students rated the overall quality of learning they did in the course as "excellent," as did 86% of the 2018 cohort. The rate at which students grasped the material varied: some commented that the pace was too fast and one commented that it was too slow. One student commented, "Every subject that was presented was explained very thoroughly which was very helpful. There were a lot of activities which made it easier to learn. The research project was a good way to incorporate everything we learned during the semester and show how much we knew." Another noted that the shared electronic notebooks were "...helpful in keeping track of data and the steps we did in each program."

It is important for instructors to have a working knowledge of NCBI databases, programming in BASH, programming in R and viral genome annotation. Students need only to be familiar with basic genetics and DNA structure at the Sophomore level or above.

There are abundant opportunities for other activities to extend the learning and these projects are often scalable. For example, students could explore the ERV sequences of an animal family, compare how the sequences differ and generate hypotheses about common ancestors and rates of mutation. There are many genomic or transcriptomic sequence records that have been analyzed for only one variable and can be explored. Genes of unknown function can be explored in annotated genomes.

In conclusion, these big data modules provide upper‐level undergraduates the opportunity to explore a range of modern bioinformatics tools. Students gain hands‐on experience with data manipulation, analysis and the boundaries of current knowledge. These students engage in authentic research experiences and contribute to the breadth of understanding of the vast sequencing data available.

ACKNOWLEDGMENTS

Support was provided by Franklin Pierce University College of Health and Natural Sciences. HERV searches were based on searches such as those in Villeson et al.19 Phage genome sequences and annotation guidance were provided by the HHMI SEA‐PHAGES program.20–22 The Drosophila annotation was done under the auspices of the Genome Education Project using contigs that they provided.23 The command line BASH module is based on curriculum developed at the University of New Hampshire for the T3 program under the direction of W. Kelley Thomas with significant technical assistance from Joseph Sevigny and Devin Thomas (IPERT research education grant from the National Institute of General Medical Sciences at the National Institutes of Health [R25 GM125674]). The command line module was generously hosted at UNH on their teaching server, "Ron." Thanks to the 2018 and 2021 FPU Genome Research classes.

GRAPH: Appendix S1 Supplemental Materials—Outline of topics, timing of modules, resources.

GRAPH: Figure S1. Human endogenous retroviruses (HERVs) identified in other species. (A) A phylogenetic tree with blue arrows indicating species that students identified with genomes containing similar endogenous retrovirus sequences. (B) A cartoon of the sequence alignment from NCBI BLAST showing regions of the HERVs found in the database. (C) A sequence alignment from NCBI BLAST showing regions of identical sequence between HERVs and sequences in a parasitic worm.

MAP: Figure S2. Screenshots of projects. (A) Genome map ("phamerator map") of annotated bacteriophage genome. Green boxes represent forward strand genes, red boxes represent reverse genes. Scale in kilobases. (B) Screen shot of UNH teaching server Ron displaying large datasets. (C) UCSC Genome Browser displaying Drosophila contig analyzed. Each horizontal line displays a genome browser evidence track.

Footnotes 1 Funding information National Institutes of Health, Grant/Award Number: R25 GM125674 REFERENCES Macaulay J, Bailey C. Learning how to teach science with big data. Biochem. Mol. Biol. Educ. 2021 ; 49 : 311 – 2. 2 GenBank and WGS Statistics. 2021. Available from: https://www.ncbi.nlm.nih.gov/genbank/statistics/ 3 U.S. Department of Education Number of STEM Degrees Conferred, 2015. [Cited 2021 March 12]. Available from: https://nces.ed.gov/programs/digest/d16/tables/dt16_318.45.asp 4 McClaren BJ, Crellin E, Janinski M, Nisselle AE, Ng L, Metcalfe SA, et al. Preparing medical specialists for genomic medicine: continuing education should include opportunities for experiential learning. Front Genet. 2020 ; 11 : 1 – 11. 5 Maloney M, Parker J, Leblanc M, Woodard CT, Glackin M, Hanrahan M. Bioinformatics and the undergraduate curriculum. CBE Life Sci Educ. 2010 ; 9 : 172 – 4. 6 Vincent AT, Bourbonnais Y, Brouard JS, Deveau H, Droit A, Gagné SM, et al. Implementing a web‐based introductory bioinformatics course for non‐bioinformaticians that incorporates practical exercises. Biochem Mol Biol Educ. 2018 ; 46 : 31 – 8. 7 Feig AL, Jabri E. Incorporation of bioinformatics exercises into the undergraduate biochemistry curriculum. Biochem Mol Biol Educ. 2002 ; 30 : 224 – 31. 8 Wightman B, Hark AT. Integration of bioinformatics into an undergraduate biology curriculum and the impact on development of mathematical skills. Biochem Mol Biol Educ. 2012 ; 40 : 310 – 9. 9 Furge LL, Stevens‐Truss R, Moore DB, Langeland JA. Vertical and horizontal integration of bioinformatics education: a modular, interdisciplinary approach. Biochem Mol Biol Educ. 2009 ; 37 : 26 – 36. Reed KE, Richardson JM. Using microbial genome annotation as a foundation for collaborative student research. Biochem Mol Biol Educ. 2013 ; 41 : 34 – 43. Linn MC, Palmer E, Baranger A, Gerard E, Stone E. Undergraduate research experiences: impacts and opportunities. Science. 2015 ; 347 : 1261757. Eagan MK, Hurtado S, Chang MJ, Garcia GA, Herrera FA, Garibay JC. Making a difference in science education: the impact of undergraduate research programs. Am Educ Res J. 2013 ; 50 : 683 – 713. Auchincloss LC, Laursen SL, Branchaw JL, Eagan K, Graham M, Hanauer DI, et al. Assessment of course‐based undergraduate research experiences: Ameeting report. CBE Life Sci Educ. 2014 ; 13 : 29 – 40. Lopatto D. Undergraduate research experiences support science career decisions and active learning. CBE Life Sci Educ. 2007 ; 6 : 297 – 306. I. E. Ayyash, E. N. Bennett, J. Haynes, J. D. Hughes, E. I. Niemi, H. G. Rogers, et al. Complete Genome of Mycobacteriophage Katniss. 2021. [Cited 2021 May 11]. Available from: https://www.ncbi.nlm.nih.gov/nuccore/MZ622173.1/ The R Foundation. Programming in R, n.d. Available from: https://www.r-project.org/about.html Pearson RK. Explor. Data Anal. Using R. Chapman and Hall/CRC, Boca Raton, FL, USA ; 2018. p. 247 – 88. R Studio. R Studio download. 2021 Available from: https://www.rstudio.com/products/rstudio/download/#download Villesen P, Aagaard L, Wiuf C, Pedersen FS. Pedersen identification of endogenous retroviral reading frames in the human genome. Retrovirology. RStudio, PBC, Boston, MA; 2004 ; 1 : 1 – 13. Jordan TC, Burnett SH, Carson S, Caruso SM, Clase K, DeJong RJ, et al. A broadly implementable research course in phage discovery and genomics for first‐year undergraduate students. MBio. 2014 ; 5 : e01051 – 13. Cross T, Moran D, Wodarski D, Harrison M, Dunbar D. Course‐based research as a catalyst for undergraduates' interest in scientific investigation: benefits of the SEA‐PHAGES program. Counc Undergrad Res Q. 2013 ; 33 : 21. Hanauer DI, Graham MJ, Betancur L, Bobrownicki A, Cresawn SG, Garlena RA, et al. An inclusive research education community (iREC): impact of the SEA‐PHAGES program on research outcomes and student learning. Proc Natl Acad Sci USA. 2017 ; 114 : 13531 – 6. Leung W, Shaffer CD, Reed LK, Smith ST, Barshop W, Dirkes W, et al. Drosophila Muller F elements maintain a distinct set of genomic properties over 40 million years of evolution. G3 Genes Genomes Genet. 2015 ; 5 : 719 – 40.

By Evan N. Bennett and Shallee T. Page

Reported by Author; Author

Titel:
An Undergraduate Genome Research Course Using 'Big Data'
Autor/in / Beteiligte Person: Bennett, Evan N. ; Page, Shallee T.
Link:
Zeitschrift: Biochemistry and Molecular Biology Education, Jg. 50 (2022), Heft 5, S. 450-452
Veröffentlichung: 2022
Medientyp: academicJournal
ISSN: 1470-8175 (print) ; 1539-3429 (electronic)
DOI: 10.1002/bmb.21647
Schlagwort:
  • Descriptors: Undergraduate Students Data Analysis Biochemistry Research Methodology Courses
Sonstiges:
  • Nachgewiesen in: ERIC
  • Sprachen: English
  • Language: English
  • Peer Reviewed: Y
  • Page Count: 3
  • Sponsoring Agency: National Institutes of Health (NIH) (DHHS)
  • Contract Number: R25GM125674
  • Document Type: Journal Articles ; Reports - Descriptive
  • Education Level: Higher Education ; Postsecondary Education
  • Abstractor: As Provided
  • Entry Date: 2022

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

oder
oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

oder
oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

xs 0 - 576
sm 576 - 768
md 768 - 992
lg 992 - 1200
xl 1200 - 1366
xxl 1366 -