In conversation with the Acquisitor a week or so ago, we briefly touched on the subject of librarians sharing and storing their own data. This has been humming around in my head and while my thoughts aren’t fully formed, after an exchange with Dorothea on Thursday, I wanted to try and articulate something.
What are librarians doing to share their data?
We’re chasing after our research faculty, telling them about the data life cycle and best practices. We’re trying to figure out repositories–both institutional and subject–for storage. We’re identifying metadata. We’re talking about reuse and permutating old data sets into something we can leverage for future research. We’re educating ourselves in statistics, analysis, and visualization. We’re worrying about what exists for interdisciplinary discovery.
We’re not storing our own research data.
At least, we don’t seem to be. I’m not going down the path of linked data. I mean other data gathered by librarians either as part of their regular work that could be published and used by researchers or data gathered by librarians actively doing research.
Have you done research ? If so, what happened to your data at the end of your project? What about the data you’re collecting for a project right now? Did you create a Data Management Plan? Is de-identified data available for me to download? Do you remember what the labels on those spreadsheets mean? Do you have a guilty looking pile of CD-ROMs, floppy discs, or print outs?
The one IRB internal review I read, as an oft-faulty memory serves, said that the data would be stored on a computer in a locked office for a stated period of time that was reliant on publication. I don’t recall seeing long term storage, shared data, reuse, etc.
Having started wondering about this, I’ve realized how much work is still to come for my own research projects and maintaining or publishing that data. Presently, across my three active projects, everything is being captured in Google Spreadsheets. This is primarily due to the fact that none of my projects involve person sensitive data and that everything I’m doing involves working with people who aren’t anywhere near my zip code. Those shared proprietary format spreadsheets are fine for working documents, but I won’t want to rely on that as a long term solution.
Thinking about other research that I’m aware of, keeping and sharing our data definitely could have some consequences. How often have you filled out a seemingly anonymous survey with a soupcon of frustration that is fine for behind closed doors but perhaps not for public consumption? What is our responsibility to manage the risk of our survey subjects when we ask questions that could, if the raw data and the open ended comments were published, potentially cause employment issues? Note that this all ties back to that piece I wrote about IRB training a couple of weeks ago (also, if you haven’t read the comments, please do).
There’s also the question of de-identifying data that could have patron information tied to it. If we’re making those data sets available, what are the requirements for access? What risk to the patrons is there? As Dorothea reminded me, we do not just pay lip service to patron privacy. How do we balance that with sharing data? What happens when we go beyond charts and summaries? We’re obviously not up against some of the same restrictions as my patrons face with health data but the PATRIOT Act and various letters and gag orders come to mind.
So we consider the risks or potential risks. Also on our plate are storage (who wants to sign up to run the Library Research Data Repository?), access and discovery, identifying opportunities for reuse. Speaking as one who regularly suffers survey fatigue, surely librarianship is ripe for survey data reuse. Has anyone used someone else’s data? How did you cite it? Are there systematic reviews and meta-analysis that we could do, if only we had access? What kind of data would you like to be able to get your hands on from other libraries and can we make it easily available?
That’s where I am thus far. Many, many questions to consider. What have I missed? What else is a consideration? Have you shared your data?
And perhaps most importantly:
Why aren’t we setting a better example for our faculty?