blog authors
past blog entries

Welcome to the Kellylab blog

geospatial matters

Please read the UC Berkeley Computer Use Policy. Only members can post comments on this blog.

Entries in data (97)

Tuesday
Jun042013

Conference wrap up: DataEdge 2013

The 2nd DataEdge Conference, organized by UC Berkeley’s I School, has wrapped, and it was a doozy. The GIF was a sponsor, and Kevin Koy from the Geospatial Innovation Facility gave a workshop Understanding the Natural World Through Spatial Data. Here are some of my highlights from what was a solid and fascinating 1.5 days. (All presentations are now available online.)

Michael Manoochehri, from Google, gave the workshop Data Just Right: A Practical Introduction to Data Science Skills. This was a terrific and useful interactive talk discussing/asking: who/what is a data scientist? One early definition he offered was a person with 3 groups of skills: statistics, coding or an engineering approach to solving a problem, and communication. He further refined this definition with a list of practical skills for the modern data scientist:

  • Short-term skills: Have a working knowledge of R; be proficient in python and JavaScript, for analysis and web interaction; understand SQL; know your way around a unix shell; be familiar with distributed data platforms like Hadoop; understand the Data Pipeline: collection, processing, analysis, visualization, communication.
  • Long-term skills: Statistics: understand what k-means clustering is, multiple regression, Baysien inference; and Visualization: both the technical and communication aspects of good viz.
  • Finally: Dive into a real data set; and focus on real use cases.

Many other great points were brought up in the discussion: the data storage conundrum in science was one. We are required to make our public data available: where will we store datasets, how will we share them and pay for access of public scientific data in the future?

Kate Crawford, Principal Researcher, Microsoft Research New England gave the keynote address entitled The Raw and the Cooked: The Mythologies of Big Data. She wove together an extremely thoughtful and informative talk about some of our misconceptions about Big Data: the “myths” of her title. She framed the talk by introducing Claude Levi-Strauss’ influential anthropological work “The Raw and the Cooked” - a study of Amerindian mythology that presents myths as a type of speech through which a language and culture could be discovered and learned. You know you are in for a provocative talk in a Big Data conference when the keynote leads with CLS. She then presented a series of 6 myths about Big Data, illustrated simply with a few slides each. Here is a quick summary of the myths:

  1. Big Data is new: the term was first used in 1997, but the “pre-history” of Big Data originates much earlier, in 1950s climate science for example, or even earlier. What we have is new tools driving new foci.
  2. Big Data is objective: she used the example of post-Sandy tweets, and makes the point that while widespread, these data are a subset of a subset. Muki Haklay makes the same point with his cautionary: “you are mining the outliers” comment (see previous post). She also pointed out that 2013 marks the point in the history of the internet when 51% of web traffic is non-human. Who are you listening to?
  3. Big Data won't discriminate: does BD avoid group level prejudice? We all know this, people not only have different access to the internet, but given that your user experience has been framed by your previous use and interaction with the web, the rich and the poor see different internets.
  4. Big Data makes cities smart: there are numerous terrific examples of smart cities (even many in the recent news) but resource allocation is not even. When smart phones are used for example to map potholes needing repair, repairs are concentrated in areas where cell phone use is higher: the device becomes a proxy for the need.
  5. Big Data is anonymous: Big Data has a Big Privacy problem. We all know this, especially in the health fields. I learned the new term “Health Surrogate Data” which is information about your health that results from your interaction with the Internet. Great stuff for Google Flu Tracker for example, but still worrying. The standard law for protection in the public health field, HIPAA, is similar to “bringing a knife to a gunfight” as she quoted Nicholas Terry.
  6. You can opt out: there are currently no clear ways to opt out. She asks: how much would you pay for privacy? And if the technological means to do so were created and made widespread, we would likely see the development of privacy as a luxury good, further differentiating internet experience based on income.

The panel discussion Digital Afterlife: What Happens to Your Data When You Die? moderated by Jess Hemerly from Google, and including Jed Brubaker from UC Irvine and Stephen Wu, a technology and intellectual property attorney was eye-opening and engaging. Each speaker gave a presentation from their expertise: Stephen Wu gave us a primer on digital identity estate planning and Jed Brubaker shared his research on the spaces left in social media when someone dies. Both talks were utterly fascinating, thought provoking and unique.

And finally, Jeffrey Heer from Stanford University gave a stunning and fun talk entitled Visualization and Interactive Data Analysis showcased his Viz work, and introduced to many of us Data Wrangler, which is awesome.

Great conference!

Thursday
May302013

Mobile Field Data Collection, Made Easy

Recommendation from Greeninfo Network's MapLines newsletter:

"Attention land trusts, weed mappers, trail maintainers and others - Are you ready for the Spring field work season?  GreenInfo recommends using this customizable, free app for collecting data in the field - Fulcrum App, which offers a free single user plan for storing up to 100 mb of data."

According to their website, with Fulcrum, you build apps to your specifications, allowing you to control exactly what data is captured from the field and how. Maintain high standards of quality to minimize rework, QA/QC, and error correction by getting it done right the first time.

Wednesday
May292013

PROBA-V satellite launched May 7

Proba-V’s first image of FranceI haven't used PROBA imagery, but many colleagues in Europe rely on this sensor.

PROBA-V (i.e. "vegetation") was launched May 7. The miniature satellite is designed to map land cover and vegetation growth across the entire planet every two days. The data can be used for alerting authorities to crop failures or monitoring the spread of deserts and deforestation.

Less than a cubic metre in volume, Proba-V is a miniaturised ESA satellite tasked with a full-scale mission: to map land cover and vegetation growth across the entire planet every two days.

Proba-V is flying a lighter but fully functional redesign of the Vegetation imaging instruments previously flown aboard France’s full-sized Spot-4 and Spot-5 satellites, which have been observing Earth since 1998.

Check it out: http://www.esa.int/Our_Activities/Technology/Proba_Missions/Proba-V_opens_its_eyes

Monday
May062013

CPAD 1.9 released today: mapping protected areas in California

CPAD, the California's Protected Areas Database is releasing a new version. This product maps lands owned in fee by public and nongovernmental organizations for open space purposes, ranging from small neighborhood parks to large wilderness areas.

CPAD 1.9 a major update that corrects many outstanding issues with CPAD holdings data and also has many new additions, particular for urban parks.

CPAD is produced and managed by GreenInfo Network, a 16 year old non-profit organization that supports public interest groups and agencies with geospatial technology. CPAD data development is conducted with Esri ArcGIS products, supplemented with open source web application tools. 

Find the data here.

Wednesday
Jan302013

Mapping and interactive projections with D3

D3 is a javascript library that brings data to life through an unending array of vizualizations.  Whether you've realized it or not, D3 has been driving many of the most compeling data visualizations that you have likely seen throughout the last year including a popular series of election tracking tools in the New York Times.

You can find a series of examples in D3's gallery that will keep you busy for hours!

In addition to the fantastic charting tools, D3 also enables a growing list of mapping capabilities.  It is really exciting to see where all this is heading.  D3's developers have been spending a lot of time most recently working on projections transformations.  Check out these amazing interactive projection examples:

Projection Transitions

Comparing Map Projections

Adaptive Composite Map Projections (be sure to use chrome for the text to display correctly)

Can't wait to see what the future has in store for bringng custom map projections to life in more web map applications!

 

Thursday
Oct112012

Hey Sandi Toksvig! Denmark is releasing data...

From the LASTools list. Recently, the Danish government released this announcement of free access to public sector data. Among other things, it means that Danish mapping and elevation data will become free (apparenty "free" as in speech as well as in beer).

Apparently, the intention is that the data should be accessible from the beginning of next year. Ole Sohn, Danish Minister for Business and Growth said:

“When the data has been released it can be used to develop completely new types of digital products, solutions, and services, which will benefit our companies as well as society at large. It is a vital part of Denmark's digital raw material that we are now releasing, which will create growth and jobs in Denmark”.

Friday
Aug312012

Bing Maps completes Global Ortho project for US

The Bing Maps team has anounced the completion of the Global Ortho Project for the US.  The project provides 30cm resolution imagery for the entire US, all acquired within the last 2 years.  You can access all of the imagery now through Bing Maps, it is pretty amazing to see such detail for all of the far off places that typically don't get high resolution attention. 

Find out more about the project from the Bing Maps Blog, or view the data for yourself.

Monday
Jul232012

New open datasets for City of Oakland and Alameda County

Following on the footsteps of the county and city of San Francisco open data repository at data.sfgov.org, two new beta open data repositories have recently been released for the City of Oakland and Alameda County. This development coincides with the recent 2012 Code for Oakland hackathon last week. The hackathon aims to make government more transparent in the city and county through the use of technology with apps and the web to make public access to government data easier. The City of Oakland’s open data repository at data.openoakland.org includes data on crime reports for a variety of spatial scales, a variety of tabular and geographic data such as parcels, roads, trees, public infrastructure, and locations of new development to name a few. It is important to note that the Oakland open data repository is currently not officially run or maintained by the City of Oakland. It is currently maintained by members of the community and the OpenOakland Brigade. Alameda County’s open data repository at data.acgov.org includes data on Sherriff crime reports, restaurant health reports, solar generation data, and a variety of tabular and geographic data and public health department data. Data can be viewed on a browser as an interactive table or an interactive map or the data can be downloaded in a variety of formats. Both sites are still in their infancy so expect more datasets to come online soon. Also on the same note, the Urban Strategies Council recently released a new version of their InfoAlamedaCounty webGIS data visualization and map viewer - check it out.

 Screenshot of City of Oakland Open Data: data.openoakland.org

Screenshot of Alameda County Open Data: data.acgov.org

Wednesday
Jun202012

Void-Free Global DSM

NextMap's World 30 DSM is ready. Pricing starts at as low as 1 cent per km2.

ASTER, with some problematic gaps...

NEXTMap, without the gaps

Wednesday
Apr182012

New high resolution coastal elevation data for California

The California Ocean Protection Council has released state-wide high resolution elevation data for coastal California and much of San Francisco Bay. LiDAR data were collected between 2009-2011 and cover nearly 3,800 square miles. Data can be download from NOAA Coastal Services Center's Digital Coast website: