Education Session Talks

Presentations by or for educators to share perspectives on data science education and tools on our campus

Session Format: Presenters, live and remote, will give five minute talks in the order of appearance on this page. There will be time for a few questions in between each talk, but please continue the conversations during the breaks and in Airmeet's social lounge for this event.


 

Peter Dewitt portrait

Peter DeWitt

Speaker Affiliation: CU Anschutz

Title: Harmonized Pediatric Traumatic Brain Injury Hackathon

Abstract:

Objectives: Elicit novel models for clinical decision support by predicting hospital mortality (binary classifier) and discharge functional status score (FSS, restrict integer values) in pediatric traumatic brain injury (TBI) patients via secondary analysis of a clinical dataset.

Materials and Methods: A de-identified set of 300 pediatric TBI patients was provided as a training set and an 88 patient testing set. Via github.com, training data and templates for data preparation, model training, and prediction were provided in both R and python. Developmental and final submissions were made via tags in participant git repositories and assessed within docker containers.

Results: 27 registrations and 11 submissions. Random forest was the most common approach for modeling either outcome. Other methods included logistic regression, ridge regression, stacked generalization and support vector machine. Within the hold-out testing set, FSS was predicted with mean squared error ranging from 3.6 to 9.5. Mortality was predicted with MCC ranging from 0.47 to 0.91, F1 0.42 to 0.92, specificity from 0.93 to 1.00, and sensitivity 0.28 to 1.00.

Discussion: Submissions were focused on modeling and not the input data. Only one submission clearly checked for data quality beyond missing data. Missing data was generally ignored or imputed impossible data values. The utility of the submitted models is limited as a result.

Conclusions: Tracking participant submissions via git submodules within one repository made scripted assessment and patching easy for the administrator. Data Science pedagogy must emphasize exploratory data analysis as a critical part of building useful models.

 

Presentation Mode: in-person

Jan Mandel portrait
Joe Malingowski portrait
Amy Roberts

Jan Mandel, Amy Roberts, & Joe Malingowski

Speaker Affiliation: CU Denver

Title: A new high-performance computing cluster Alderaan

Abstract: A new cluster, Alderaan, was funded by the National Science Foundation (NSF). The cluster has 2048 compute cores with total 4TB memory, plus 128 on 2 GPU nodes with total 4TB memory, and 1 PB storage. At least 10% of its capacity is dedicated to classroom use. At least 20% of cycles will be used for the Open Science Grid (OSG), and CU Denver users also have the ability to send a large number of single-core jobs to OSG. Currently, the cluster has a limited number of "friendly users" who help with testing while the software installation and configuration is being completed. The cluster should scale up to full deployment in the next few months.

Presentation Mode: remote

Haadi Jafarian portrait
Ersin Dincelli portrait

Haadi Jafarian & Ersin Dincelli

Speaker Affiliation: CU Denver

Title: GenCyber: Inspiring the next Generation of Cyber Stars

Abstract: In Summer 2021, we held an NSA/NSF-funded GenCyber summer camp for underrepresented minority (URM) students in Colorado. The GenCyber program provides cybersecurity camp experiences for students at the K-12 level with the goal of providing age-appropriate cybersecurity awareness and learning opportunities, increasing interest in cybersecurity and diversity in the cybersecurity workforce, and closing the cybersecurity workforce gap. In this talk, we will share our experiences and derived intuitions from organizing and launching a cybersecurity camp for a diverse set of students. We will briefly discuss different pedagogical methods that we tried during the camp, explain which methods were the most effective based on the survey results conducted with the participants, and explain some of the design elements we used to create a safe and secure environment for USM students to discover the cybersecurity field.

Presentation mode: remote

Julien Langou

Julien Langou

Speaker Affiliation: CU Denver, Mathematical and Statistical Sciences

Title: Some evidences that the more we rely on computer-aided computation (e.g. through Python) in a linear algebra course, the more equitable the course is

Abstract: Linear Algebra is a mathematical topic taught at the University level in a typical one-semester course. Linear Algebra consists in the study of matrices and vectors. The course content to be covered in the class is pretty dense by itself. In addition teaching the material requires an early introduction to logic principles that students are, in general, not familiar with. In addition, we teach some programming skills through Python that students are, in general, not familiar with. In addition, we use some data science applications in our linear algebra course. Infusing data science applications in a linear algebra course enables to engage students in activities that show direct applications of the material taught. It also reinforces the theory that they have learned, and it stresses the relevance of the theory in practical applications. Learning becomes more concrete which helps with students' understanding. To repeat: course content to be covered, in addition principle of logic, in addition programming in Python, in addition data science applications. This is a lot for a semester. In this course design, we have been reducing the reliance on traditional undergraduate-level linear algebra exercises that traditionally require painfully long series of additions, subtractions, multiplications and divisions by replacing these traditional exercises with Python-based exercises where computations are done by the computer (through Python). Doing so, the students are placed at a higher level of abstraction. They apply mathematical concepts by manipulating vectors and matrices through the computer. This enables us to make the course manageable in a semester. We will also argue that replacing hand computation by Python computation is a more equitable way to teach linear algebra. With hand computation, we (students and instructors) are constrained by our limited human computing abilities, which is for all practical matter 2D, 3D and maybe 4D for the more tenacious and daring of us. The fact to go beyond 2D and 3D examples (by constructing 4D, 5D, 6D, etc. examples) is a critical step for students. With Python, we can seamlessly immerse ourselves in high dimensions. The tedious computation is removed, but the effort of abstraction and the effort of understanding and manipulating high dimensions appear. This activates students' early sense for multi-dimensional data. In that context, Python is used as some kind of big and fancy calculator. Instead of punching in numbers for computation as in a TI calculator, you punch in matrices and vectors for computation through Python. Finally once students are comfortable with this level of Python usage, starting to get our hands dirty in some data science applications is actually a few steps away, fun and rewarding. We have been teaching the class using Python through Google Colab.

Presentation Mode: in-person

Ashis Biswas portrait

Ashis Biswas

Speaker Affiliation: CU Denver

Title: Data Science Competitions: A know-how to participate

Abstract: It is problematic to find that there is a skewed availability of data science related learning contents vs. contents leading to what one is supposed to do with the learned concepts. Most teaching materials in Data Science, especially, Machine Learning and Deep Learning in an academic setting struggle to engage the pupils in applying the knowledge to solve everyday problems. There is a "believable gap" between graduating from a relevant course and applying the learned ideas in a real world impactful problem solving. Participating at the competitive data science platforms like Kaggle, DrivenData etc. put a participant in a position to utilize the concepts in a more practical way which is both encouraging and constructive. In this talk, the audience will learn about the importance of participating at the competitions, how to start participating at one of the venue, Kaggle and possessing a competitive mindset to improve the submission entry little-by-little amongst the thousands of experts in the world and eventually become successful in their career in Data Science.

Presentation Mode: remote

Diane Fritz portrait

Diane Fritz

Speaker Affiliation: OpenStreetMap U.S. Board / Auraria Library

Title: OpenStreetMap: a Global Geospatial Dataset for Community and Data Science

Abstract: OpenStreetMap (OSM) is a crowd-sourced, global, geospatial dataset that most people use every day without realizing it. In addition to being the map structure behind many apps we have on our phones, OSM can be a tool for teaching students how to create spatial data or a rich source of information for analytics. This talk will give an overview of the OSM data structure, how students and their communities can contribute to OSM, and ways of accessing OSM data for research. 

Presentation Mode: in-person

CMS Login