Data Friday: Data Tidbits 4
Back for more tidbits and adding to the ever growing “to read” list….
John Kraus was at ACS Denver and it was good to hear tweets coming out from him that were talking about the need for open data and shared data. Check his tweet stream on August 30th. I think my favorites were quotes from Wendy Warr about the need to reward scientists for cleaning up data and that we need to reward citability of data. The other name he mentioned that I’d like to look up is Rudy Potenzone.
The 2nd Wolfram Data Summit is just finished in Washington DC. I’ve seen a couple of tweets about it but will need to go digging further. Put it on the list of things to possibly attend next year!
Data Analytics Summit is coming up here in Chicago. This is sponsored by Aster Data and looks primarily like a sales pitch. It’s more focused on business analytics as opposed to academic data sets but I’m sure there would some interesting overlap. If anyone is going to go who’d be willing tochat about it afterwards, let me know?
New (to me) Datasets
On Friendfeed, Mr. Gunn was looking for a presentation given recently about NYC Data Initiatives. While I couldn’t find the answer (let me know if you do!) I did find city information data sets from the Columbia University Columbia Research Population Center. It includes a Health Survey and Youth Risk Survey, which I could readily see applying in my own job–possibly useful for the Applied Health, College of Dentistry and Institute for Juvenile Research. One can also suggest the addition of a dataset to be inventoried.
Have a stargazer at your house or institution? Introduce them to the Planetary Data Systems from NASA. I really like their site, there’s a lot of very clear information about their submission process, peer review, expectations, and you can get RSS of new data sets! There’s a Students and Educators section as well for teachers or school media specialists who’d like to join in the fun.
Digital Science in London (owned by Macmillan) announced a partnership with Figshare and I’ve seen some tweets that it was being talked about in DC at the Wolfram Data Summit. That was a new name to me, and checking out their website I found that:
“FigShare allows you to share all of your data, negative results and unpublished figures. In doing this, other researchers will not duplicate the work, but instead may publish with your previously wasted figures, or offer collaboration opportunities and feedback on preprint figures.”
Figshare requires Creative Commons licensing on everything uploaded for sharing, which certainly could have an effect on what people are able to contribute based on institutional policies, grant requirements, and the laws of the country of origin. Figshare appears to be targeting individual researchers with the idea of allowing them a space to put their datasets, media, extra figures, etc in order to promote discovery and collaboration. They don’t appear to be doing much with data curation at the moment–the tags that were on the datasets that I looked at were pretty obviously done by those who had uploaded them, certainly they didn’t appear to be based on any standard list. The majority of it appears to focus on the hard sciences for right now, I assume there would be HIPPA concerns at the very least for any US medical datasets.
There are a couple of things that trouble me about the site as it is at present. Obviously it’s in Beta, so I assume that improvements will be made. Most of the Anatomy pages that I clicked to said “This page no longer exists…”. The Systems Institute, who is identified as a primary funder, currently has a website under reconstruction. While the new partnership implies future stability, I’m concerned that it may quickly lead to things living behind a paywall, particularly after looking other Digital Science partners. And I’d just like to see more being done for discoverability. Right now things are sorted by discipline and the focus is definitely on the hard sciences, there aren’t any sections for anything in the arts or humanities. There doesn’t seem to be anyone cleaning up or managing the metadata or suggesting that perhaps the obviously labeled “This is my test dataset” down so it’s not cluttering up the “everything” list.
What is very nice about this is easily citable nature of the datasets that are uploaded. If data can live in a fixed location that is accessible, it allows for much greater possibility of citation of datasets–something that is repeatedly being referred to as a coming necessity if we’re expecting researchers to share them and hopefully to get tenure credit/develop professional reputations on their datasets.
Who Do I Follow For Data News?
Want to Work in Data?
Assistant Professor in Informatics @ SJSU SLIS. They are looking for someone to teach and train Data Librarians.
Geographic Information Systems Librarian This job is very focused on geographic data (obviously!). I imagine they’ll be working closely with the DataOne people.
Reference Librarian for the Physical and Mathematical Sciences. North Carolina State University Libraries. Unfortunately, the day that I’m working on this, that link isn’t working. Hoping it’s back up for you on Friday. They’re looking for someone who specifically can work on numeric and spatial data services.