Data Science Education

Speakers and abstracts for the data science education session

Presentations by or for educators to share perspectives on data science education and tools


Audrey Hendricks portrait

Audrey Hendricks

Speaker Affiliation: Associate Professor, Department of Mathematical and Statistical Sciences, CU Denver

Title: Successful and sustainable undergraduate research in data science

Abstract: Undergraduate research is a powerful tool for exposing students to statistics and preparing them for data science careers. Scaling data science undergraduate research programs to more than a handful of students can be a challenging investment of time and resources, especially for junior faculty. Here, I discuss how I developed a successful and sustainable program as an Assistant Professor including structuring meetings, examples of student self-assessment, progress reports, and mentorship. To increase sustainability, I built a team with varied research experience and use tiered and near-peer mentorship, which encourages a cohesive group and enables students to gain leadership experience. PhD students' experiences in such a program enables them to port this model and promote undergraduate research at their future institutions. We use technology to enable virtual chats, sharing code, and tracking of action items. Vertical integration protects faculty time and enables a larger team, which engages students from a wide array of academic backgrounds. This variety of expertise leads to greater creativity and helps students learn to communicate across disciplines, an in-demand skill for careers in academic and industry where multidisciplinary collaboration is common.

Jan Mandel portrait

Jan Mandel

Speaker Affiliation: Professor, Professor, Director of the Center for Computational Mathematics (CCM), Department of Mathematical and Statistical Sciences, CU Denver

Title: Adleraan High-Performance Computing Cluster

Abstract: The Alderaan cluster at the Center for Computational Mathematics is available for the use by CU Denver faculty, students, and their collaborators. The cluster has a total 2,176 AMD cores, 20TB of memory, 4 NVIDIA A100GPUs, and 1PB storage, and it is integrated with other existing clusters suitable for long-running jobs. This talk will overview the cluster architecture, available software, how to get an account, and how to access it.

More information is available at https://ccm-docs.readthedocs.io/en/latest/alderaan/

Ashis Biswas portrait

Ashis Kumer Biswas

Speaker Affiliation: Assistant Professor, Department of Computer Science and Engineering, CU Denver

Title: Responsible AI principles and practices

Abstract: The fourth industrial revolution fuses Artificial Intelligence (AI) into the advancement of automation technologies from numerous disciplines, thereby impacting various aspects of people's lives and the society at large. It is, therefore, important to design, build and deploy AI systems responsibly to ensure fairness, inclusiveness, reliability, transparency, privacy, accountability and understanding of limitations. The talk illustrates the responsible AI system design principles and the "think-before-you-code" practices to make an impact.

Keith Guzik portrait

Keith Guzik

Speaker Affiliation: Assistant Professor, Department of Sociology, CU Denver

Title: Introduction to critical algorithm studies

Abstract: Data science represents an exciting field of intellectual practice that promises to uncover new facts and optimize decision-making and operations in society through the computer-driven collection and analysis of mundane personal data. In this regard, data science shares an affinity with the social sciences more generally, which also look to uncover patterns of thinking and behavior. Nevertheless, the computer-driven collection and analysis of data and the policies implemented on their basis have generated a wealth of diversity, equity, and inclusion concerns. This presentation provides a brief introduction to the growing field of critical algorithm studies, which leverages critical-minded traditions of social inquiry to question data science. More than just a critique, it is intended to have us consider whether data science can serve society and our institution in meeting our DEI goals.

Troy Butler portrait

Troy Butler

Speaker Affiliation: Associate Professor, Department of Mathematical and Statistical Sciences, CU Denver

Title: A Computational OER Pathway through Mathematics and Statistics Curricula

Abstract: The Colorado Department of Higher Education recently awarded the Department of Mathematical and Statistical Sciences at CU Denver a two-year institutional Open Educational Resources (OER) grant for their project “OER for the Creation of Interactive Computational Notebooks and a Computational Pathway in Mathematics and Statistics.” We provide an overview of project goals and our vision for the integration of interactive and engaging OER computational resources at all levels of our curriculum. The project focuses on the incorporation of two of the most popular programming languages in data science: Python and R. These languages are incorporated into the curricula through the utilization of Jupyter and R Markdown notebook environments. The goals of this project align well with the goals of the recently created Data Science Degree Program. The majority of our materials are hosted on GitHub (https://github.com/CU-Denver-MathStats-OER) and are designed to run in cloud computing environments such as Google's Colab environment (https://colab.research.google.com/). In particular, the Programming for Data Science course (MATH 1376, https://github.com/CU-Denver-MathStats-OER/Programming-for-Data-Science) is being taught this Fall 2022 semester in Colab. The utilization of Colab and GitHub enables all students and educators to easily access and provide feedback on materials as they are continuously developed. The presentatoin itself illustrates the ability of notebooks to seamlessly weave together rich narrative text, mathematics, multimedia, and live code that actively engage students and facilitate a deeper learning of concepts. We discuss the various platforms available to run notebooks including options that address issues of equitable access and how participants can provide real-time feedback as materials are continuously created and made available on public repositories.

Madeline Fischer portrait Hannah Thompson portrait

Madeline Fischer & Hannah Thompson

Speaker Affiliation: Students, US Air Force Academy

Title: Predicting Authorship of the Unknown Federalist Papers

Abstract: Alexander Hamilton, James Madison, and John Jay anonymously published the Federalist Papers in 1787-88 in an effort to market the Constitution. The papers were initially published under the alias “Publius” to preserve the reputations of the authors, but over time, the authors of all but 12 of the 85 papers were discovered. The question of authorship is not unique to the Federalist Papers; thus, statisticians have developed a variety of techniques to solve this complex classification problem. Specifically, Frederick Mosteller and David Wallace at Harvard University and University of Chicago created a framework for the Federalist Papers using relative word frequency in their paper “Deciding Authorship”. This word frequency framework was implemented in an inflexible, linear discriminant analysis model and a flexible, k-nearest neighbors with one neighbor model to predict the authors of the remaining 12 papers. The models were programmed, trained, and validated using a variety of techniques to optimize the predictions. The inflexible model achieved 100% accuracy and the flexible KNN model achieved 99.3% accuracy. Considering the model performance on the validation sets, the results for the unknown author predictions were reliable. Both models predicted the unknown papers to be written by Madison excluding paper numbers 50, 54, and 55, which is comparable to previous studies. Thus, the model accuracies suggest that future text-classification problems should rely on word frequency analysis to determine the best predictors for both flexible and inflexible models.

CMS Login