The Mobilize Introduction to Data Science (IDS) course is a dynamic, computation-based statistics and probability curriculum that teaches all students to think critically about and with data. During the yearlong course, IDS engages students with real data, introducing statistical, computational and graphical tools for reasoning about the world. IDS is a C-approved mathematics course in the University of California A-G Requirements. As a statistics course, successful completion of IDS validates Algebra II in California. IDS directly addresses the Common Core State Standards (CCSS) for High School Statistics and Probability and Practices for Modeling.
IDS immediately engages students with real data, introducing statistical, computational, and graphical tools for reasoning about the world. Through IDS lessons, students function as researchers by making truly original discoveries about the world around them. Through collecting their own data using hand-held devices, and by examining data from formal sources, students learn to generate hypotheses, fit statistical and mathematical models to data, implement these models algorithmically, evaluate how well these models fit reality, and to think computationally while learning to program with data. IDS students learn how to work with Participatory Sensing (collecting data through their smart phones) and R, an open-source programming language that has long been the standard for academic statisticians and analysts in industry. Through R, implemented through the RStudio interface, students learn to code and to compute with data to develop graphical and numerical summaries to both communicate findings and to generate further exploration.
The Mobilize Introduction to Data Science (IDS) course is a dynamic, computation-based statistics and probability curriculum that teaches all students to think critically about and with data. During the yearlong course, IDS engages students with real data, introducing statistical, computational and graphical tools for reasoning about the world. IDS is a C-approved mathematics course in the University of California A-G Requirements. As a statistics course, successful completion of IDS validates Algebra II in California. IDS directly addresses the Common Core State Standards (CCSS) for High School Statistics and Probability and Practices for Modeling.
IDS immediately engages students with real data, introducing statistical, computational, and graphical tools for reasoning about the world. Through IDS lessons, students function as researchers by making truly original discoveries about the world around them. Through collecting their own data using hand-held devices, and by examining data from formal sources, students learn to generate hypotheses, fit statistical and mathematical models to data, implement these models algorithmically, evaluate how well these models fit reality, and to think computationally while learning to program with data. IDS students learn how to work with Participatory Sensing (collecting data through their smart phones) and R, an open-source programming language that has long been the standard for academic statisticians and analysts in industry. Through R, implemented through the RStudio interface, students learn to code and to compute with data to develop graphical and numerical summaries to both communicate findings and to generate further exploration.
William Finzer
This conveys very well what students and teachers see in the IDS course. It must be one of the very first such courses at the high school level anywhere, right? What do you find are the hurdles in developing and teaching such courses?
Amelia McNamara
LeeAnn has more up-to-date information, but I think you’re right. There have been some data science courses at private schools, but I think this is the first data science class to be deployed in public schools, particularly at such scale.
As for hurdles, one of the main ones is professional development for teachers. The grant funded us to do a lot of this, but we want the curriculum to live on once the grant is over. PD usually only lasts a week, and it’s hard to get everyone up to speed on a year-long course, particularly with the new technology.
Katie Rich
Very interesting project. Skillful and thoughtful interpretation of data is such an important skill, and it’s great to see someone tackling how to teach it.
One of the challenges I see with designing a course like this is making sure students are accessing accurate and reliable data. Do you specify appropriate data sources? If not, how do you help teachers and students make sure they are working with reliable data?
Another challenge I’d expect in designing a course like this is managing the tension between posing authentic, open-ended questions and providing enough structure so students have a place to begin their investigations. How do you support students in answering open-ended questions?
Amelia McNamara
One of the exciting pieces of the course is that students collect “participatory sensing” data with their smartphones. This data isn’t randomly sampled, so the inference you can make is limited. But, it has a lot of interesting properties— the data is geotagged, so students can make maps, there is often text data, etc. We also suggest many data sources in the curriculum, including movie data from the IMDB and the American Time Use Survey.
One of the ways we skirt the inference issue is by using randomization and the bootstrap instead of standard statistical formulas. And there is certainly a component to the course of critical thinking about data. Who collected it? Why? What might be missing?
Robert Gould
To add to Amelia’s excellent response: There is a tension between posing authentic, open-ended questions and providing structure. We’ve achieved some success in this direction by emphasizing the role of the “Data Cycle”, which is what the Guideline for Assessment and Instruction in Statistics Education (K-12) call “The statistical investigative process”. By paying attention to this cycle, it becomes clear that students and teachers must be thoughtful about (a) the questions they ask and (b) the data that might be used to answer these questions. This is quite a skill (and we’ve found that many adults struggle here, too!), and our solution has been to provide many opportunities and contexts in which to practice.
Irene Lee
Thanks for sharing your work. How are you planning on scaling up and broadening your reach? I’d love to see the Data Science curriculum. Is it available? if so, where? Do you have any plans to make it accessible as an online course?
Robert Gould
Thanks for that question, Irene. I’m the lead PI for the project. There are two components to starting your own IDS class. The first is the curriculum itself. This will appear any day now on our website mobilizingcs.org. The second is the technology suite, which is substantial. We’re currently working on providing a fee-for-service structure to support schools that do not have their own servers or who wish to “test drive” the technology. During this upcoming year we are preparing a “Mobilize in a Box” that will provide instructions for installing and maintaining the software on a server if a district does not want to go the fee-for-service route. In the meantime, this link http://www.mobilizingcs.org/technology provides documentation, source code, and a webpage that lets you do a small-class trial. and play with the technology. http://www.mobilizingcs.org/technology
And, last but not least, this link https://lausd.mobilizingcs.org/#demo
let’s you ‘play’ with the visualization tools that illustrate the mapping and word-clouding abilities that Amelia mentioned up above.
Irene Lee
Thank you! I’m looking forward to giving it a try.
Julie Steimle
This is a very interesting project, as there are a lot of career opportunities for individuals with a data science background. Do you find schools are open to offering this as year round course or do you have trouble recruiting schools to participate?
Suyen Machado
Thanks for your question. By year round course, do you mean a course that is to be taken and completed in one school year? The reason I ask is because in the Los Angeles Unified School District (LAUSD), year-round is a schedule to relieve school overcrowding. At LAUSD, the course has been implemented in a variety of different schedules, and there is no reason why it couldn’t be done on a year-round schedule. To date, we have recruited 27 schools to offer the course, 34 teachers have received professional development, and approximately 2,000 students have taken the year-long IDS course. We will be recruiting 20 additional LAUSD teachers to teach IDS in 2016-2017.
Annamarie Francois
Blown away by the voices of students and teachers in IDS. Thanks!
LeeAnn Trusela
Thanks for all of the rich discussion and inquiries, which we will respond to shortly. First, I"d like to pose a broader question to ponder:If all you knew about the Intro to Data Science course was that it is an alternative to Algebra II, what sort of college readiness skills do you think high school students might develop in IDS that they might not get from an Algebra II course?
William Finzer
What do people think about the relative merits of a separate course such as IDS as opposed to integration of learning data science skills and concepts across the disciplines?
Robert Gould
Hi Bill! I have my own prejudices as PI of this project. I think there is plenty of room for both approaches. In fact, we also developed short-term data science modules for integration in Algebra I, Biology, and Exploring Computer Science courses. Still, we find that one hurdle is teacher preparation. Given that our curriculum is very technology-driven, we faced difficulties in doing substantial professional development with teachers, when the teachers knew the module was only three weeks long. The training was a huge time investment for a very short class-time return. The IDS course provides the time to develop some complex ideas (interpreting graphics, understanding the consequences of different modes of data collection) and skills (programming in R) that we couldn’t do in three weeks.
William Finzer
Hi Rob! I certainly agree that there is room for both approaches, and that a separate course circumvents some of the obstacles (such as teacher preparation) present in the integration approach.
I do hope that the integration approach does not get neglected, though. In the real world data science is done in context and data habits of mind like “Where’s the data?” are (or should be) important for all members of a work group. I know that IDS promotes this point of view, but I worry that many data science courses won’t.
Rob Gould
You’re absolutely right. There must be both. I’m somewhat surprised (and saddened) for instance, that there’s no “statistics” keyword search, even though several projects seem intent on improving data literacy and statistical thinking.
Caitlin K. Martin
This is so necessary! In some work we’ve done in science classrooms engaged in citizen science projects, we noted that many teachers were not having their students use the data collected by themselves and others to run analysis and see bigger patterns. This use of bigger data sets has not been something traditionally taught to pre-service teachers or to K12 students, and this attention to this type of learning is so great to see. Wonderful resource for educators and students alike.
Evan Korth
I love that they are creating / capturing their own data. It makes it seems more real than using someone else’s. What are some of the more interesting projects your students have come up with?
I am glad to hear you are also creating modules for use in existing subjects. In addition to Bill’s points, for now I think that will be important for getting into many schools.
Rob Gould
One of the interesting findings
- and now that I type this I think it is also a partial reply to Caitlin Martin’s question above- is that it has been very difficult to get classrooms to develop their own projects. We did lots of soul searching after the first pilot to figure out why, and this is still an area of on-going research. One reason seems to be that the participatory sensing tool has certain constraints and these can be, well, constraining. The other, though, seems to be that teachers are initially inexperienced in the statistical investigation cycle, and need practice themselves before they are able to lead a classroom towards posing interesting questions that can be addressed with data. As Caitlin mentions, teachers sometimes neglect to use the data, and we think in part it’s because they simply don’t know what to use it for! We are looking forward to reviewing classroom projects from this second year of IDS to see the extent to which classrooms struck out in new and unexpected directions, or whether they stayed close to the data-collection projects as they are in the curriculum.Kim Kastens
This is a fascinating project; thanks for sharing. On the question of integrating data science into other courses versus running a free-standing year-long IDS course, I can see merits of either approach. As an Earth Science educator, I would be most enthusiastic about a model where the science course and IDS course ran side-by-side, with the same kids enrolled in both, and the teachers cooperating. There are great Earth & Space datasets that kids can collect themselves, but then these can be much enriched by having kids tap into the national and global data archives maintained by NOAA, NASA, the U.S. Geological Survey, the EPA and university sources. The Earth is so vast and heterogeneous that there is plenty of room for authentic discovery with data in national archives by high schoolers with the IDS skill set.
Roger Taylor
Interesting research. I use RStudio for my own analyses, but the syntax of R can be rather counter-intuitive. I’m a big fan of Hadley Wickham’s work (e.g., dplyr, lubridate, ggplot2) and was wondering if you used any specific R packages or create any of your own.
Rob Gould
Thanks! We do in fact have our own package: mobilizR. Our intent with this package was to simplify the syntax. Our slogan: “No dollar signs”. The idea being that, as often as possible, students use the formula syntax.
That said, R is still counter-intuitive. However, because an objective of the course was to teach computational thinking, we decided it was important to grab the bull by the horns and teach R.
mobilizR, by the way, is based on the awesome package mosaic, which is part of an NSF-funded project led by Randall Pruim that provides everything you need to teach R in an intro stats (college level) course.
Elizabeth McEneaney
Super exciting to bring this down to the high school level! Can you say a bit more about the “civic engagement” piece? I have taught statistics in sociology and education for a while, and I have always thought that working with quantitative data afforded many opportunities to analyze social problems. It’s the sort of thing that gets left out of “STEM” in some discussions.
Rob Gould
Thank you, Elizabeth. The “Civic engagement” happens in several ways. First, the curriculum asks students to collect data about their surroundings (where they dispose of trash, what they eat, their stress levels during the day, how they see water being used), and these give opportunities to discuss what is happening in their surroundings and how this might change. Second, the participatory sensing tool gives classrooms the means to determine what they think will be of interest, and to collect and analyze data as a classroom to address these interests. Thus, there’s lots of potential for the tool to be taken in new directions within the classroom. Finally, when data are collected, they are stamped with a location tag, and, once some privacy gates have been passed, the data can be displayed on a map. We’ve found that displaying a map with information collected by students often leads to questions involving social and civic issues.
Further posting is closed as the showcase has ended.