Prepare to Dive In
Teams receive access to three core data sets for the event and a potential list of questions to consider for each data set. Because we want this event to be as “real world” as possible, the questions will only provide a starting point for your analysis. The primary goal of your analysis is to provide a thorough and insightful investigation of the relationship between various property characteristics and the water quality of lakes within the seven county Twin Cities region.
We’re especially looking for examples of new, creative and innovative analysis to enhance the impact of your findings. Social Data Science is the primary client for this event, however; other organizations have already expressed interest in your findings. The implications of your results may influence many within this region and beyond.
Main Questions to Consider in Analysis
There are three overarching questions that should be considered in your analysis. These questions will require the use of all three data sets provided below. Additional or supplementary data sets may also be used at your discretion to answer these questions.
- How does property value influence water quality?
- What property characteristics influence water quality the most?
- Have changes in land use or land development over time positively or adversely impacted water quality?
In addition to the questions above, a potential list of questions has been provided below each data set as guidance. Consider these questions to be a diving board from which you can dive into the water data. All participants are expected to use their originality, creativity and ingenuity to provide the best findings, insights and recommendations possible.
Find the data sets below along with more background information. The data can be downloaded as text files or accessed using BigQuery. Helpful tutorials for using BigQuery are available at the bottom of this page.
Data Set 1: MetroGIS Regional Tax Parcel Dataset
- Seven Counties
- 2002–2015
- Approximately 1,500,000 properties
- Key Attributes:
- Tax valuations
- Dwelling Type
- Property Ownership (public, commercial, residential, etc.)
- Description of Use
Potential questions for consideration:
- Property ownership is an investment and the hope is that your investment will gain value over time. What locations have seen the most substantial increase / decrease in property values over time?
- The type of dwelling or type of property owner (i.e. public, commercial, residual, etc.) may adversely impact water quality. What locations have seen the most change or growth in these characteristics over time?
- Local and county governments are often responsible for developing policies and regulating land development. What locations have seen the most change in land development over time? What variations exist in how land is being developed across various local and county governments?
Accessing the data:
- MetroGIS Data Description PDF
- 2002 Download Ι Access with BigQuery
- 2003 Download Ι Access with BigQuery
- 2004 Download Ι Access with BigQuery
- 2005 Download Ι Access with BigQuery
- 2006 Download Ι Access with BigQuery
- 2007 Download Ι Access with BigQuery
- 2008 Download Ι Access with BigQuery
- 2009 Download Ι Access with BigQuery
- 2010 Download Ι Access with BigQuery
- 2011 Download Ι Access with BigQuery
- 2012 Download Ι Access with BigQuery
- 2013 Download Ι Access with BigQuery
- 2014 Download Ι Access with BigQuery
- 2015 Download Ι Access with BigQuery
Data Set 2: Metropolitan Council Lake Monitoring Data
- 1980–2014
- 332 Lakes
- Key Attributes:
- Seasonal Lake Grade
- Physical Condition
- Recreational Suitability
- Secchi Depth
- Total Phosphorus
Potential questions for consideration:
- What lakes tend to have the best / worst water quality? Are the lakes with the best / worst water quality consistent over time?
- What watersheds tend to have the best / worst water quality? Are the watersheds with the best / worst water quality consistent over time?
- One may anticipate that the water quality attributes of lakes within a watershed depend on each other. Does this data support this notation? How does time influence this dependence structure?
Accessing the data:
- Metropolitan Council Lakes Monitoring Data Description PDF
- Access the data set as text file
- Access the data with BigQuery
Data Set 3: Social Data Science Water Proximity Data
- Cross reference of Regional Tax Parcel & Lake Monitoring Data sets
- Key Attributes:
- Distance of Tax Parcel to nearest Lake
- Distance to nearest Lake Monitoring Station
Potential questions for consideration:
- Property that is contiguous to a lake, i.e. lake front property, is likely to influence its value. What is the relationship between proximity to water and property value? What effect does location have on this relationship? Does time influence this relationship?
- What relationships exist between various property characteristics (i.e. dwelling type, ownership type, etc.) and proximity to water? Have these relationships changed over time? Do the policies / regulations of local or county governments appear to influence these relationships?
- Does proximity to water appear to influence how land is being developed over time? Does the amount of influence vary across local or county governments?
Accessing the data:
BigQuery Tutorials:
- Accessing BigQuery Tutorial (10/4/16: Updated with correct project name)
- BigQuery with Python
- BigQuery with R
If you are having issues accessing the data or descriptions, please contact jackson@minneanalytics.org.
Presented by
For more information or to get involved, e-mail education@minneanalytics.org.