46 Data usually finds me

This document is based on my SAM talk on “I don’t go looking for Data … Data usually finds me,” from 2020.Here is a link to the slides of that talk.

46.1 I don’t go looking for Data … Data usually finds me

The most interesting aspects of my work (or at least to me) are the aspects related to finding data. However, this part is also the least documented. In my case, it primarily lives in footnotes, personal statements, and appendices.

46.2 Two Major Approaches

This approach can be applied to many aspects of our work, but it is particularly helpful in providing a framework for my experiences with data.

The exploratory approach is descriptive and is data-driven. Data scientists often apply this approach. In contrast, the confirmatory approach is question driven, and aimed at testing specific hypotheses. Research scientists often apply this approach, but not always.

46.2.1 Where do you start?

Do you start with the question or with the data?

46.3 Confirmatory Approach to Archival Data

The confirmatory approach to archival data is question driven. What types of questions? These can include questions related to theories, measures, subjects, models, replicability, or even externally-motivated questions. Theory-based questions include questions like “Do smart girls delay sex? Measurement based questions can ask things like”Is Coding Speed from the ASVAB a decent proxy for conscientiousness?“. Questions about subjects include”Where can I find Twins Raised Apart?” Modeling questions can include things like How do I illustrate my dual mediated survival model? Replication: Can I replicate my finding in another sample? Externally-inspired questions can include things like Can I address reviewer two’s concern about reliability of difference scores?

These questions narrow your search… Otherwise the scope of data is overwhelming. The wonderful Kathy Shields helped me add a section to the WFU library website to get you started guides.zsr.wfu.edu/psychology. My favorite three places to look for specific datasets are the University of Michigan’s Inter-university Consortium for Political and Social Research (ICPSR), Harvard’s Dataverse and the US government’s Data.gov. To give you a sense of scale, ICPSR has approximately 15,000 digitized data sets. Harvard has about 95,000. Data.gov has about 250,000.

46.4 Exploratory Approach

The exploratory approach to archival data is data driven. Often you already have a data set in mind. Most of my earliest experiences fall into this category. These are also the most interesting experiences. But, how does data find you? Data can find you in numerous ways including referrals, reading, rumor, and random chance. A referral example would be when a speaker tells you about an intergenerational data set partially run by the BLS. You may also just stumble across it in your readings, such as when a historian using aspects of a marriage study. A rumor may inspire you, such as observing that a control group is mentioned in the original write-up of the Terman study (1921ish). Serendipity might lead you to fly to SPSP, talk to the person sitting next to you about a study you were in…

Now in this case places to start looking will be driven by your specific needs. Sometimes, your specific dataset is already: digitized, de-identified, and clearly located. Sometimes, only parts of it are… Other times, it gets interesting…

46.5 Data Retrieval Ranges Wildly

Data retrieval ranges wildly. It can be as easy as downloading a data set. Or it can be as unwieldy as:

  • Applying for access to the “digitized” dataset on the Dataverse for an econometrics class
  • Waiting for a panel of two researchers to convene for approval
  • Learning that one researcher (E. Lowell Kelly) on the panel died in 1986 and the other (James Conley) is nowhere to be found.
  • Teaming up with a 2nd-year assistant professor (Josh Jackson) to find Conley
  • Tracking down James Connolly, who legally changed his name in 1992 (or so) from James Conley
  • Convincing Jim to share the data
  • Determining where the data are
  • Retrieving boxes from the basement of:
    • Henry A. Murray Research Archive (Part of Harvard’s Dataverse)
    • Jim Connolly’s office
    • Bentley Library (Part of Michigan’s ICPSR)

There is a silver lining to difficult data retrieval. The more hurdles between you and the data the less likely it is that you’ll be scooped.

46.6 Bit of Context: Marriage Study

46.7 List of Data Places

Greatest Hits:

Other Places - ZSR Guide - https://osf.io/ra8yg/

46.8 Great Papers to Get You Started

  • If you’re interested in neuroimaging,

Madan, C.R. Scan Once, Analyse Many: Using Large Open-Access Neuroimaging Datasets to Understand the Brain. Neuroinform (2021). https://doi.org/10.1007/s12021-021-09519-6