There is no one-stop platform (yet) to search for business datasets.
How can you get started then?
Note that BLS does not publish detailed information on household income – which includes wages and salaries, interest, dividends, pensions, and income from other sources. Information on household income is available from the U.S. Census Bureau.
The concepts used above are adapted from WRDS and BLS.
I don't usually recommend asking unless (a) you already have a relationship with the data provider (b) it's your last resort. The example below is a successful ask:
For the purposes of this study, the IRS generously provided access to one of the authors all Schedules M-3 for entities filing Forms 1120, 1120S, and 1065 for fiscal years 2008–2010 at the consolidated U.S. parent level. (Lisowsky. & Minnis, p.12)
Often, you should be prepared to consolidate data from various sources. This example discusses at length on where the data came from, how it was consolidated, and their end result:
We construct a comprehensive database of corporate officers and directors, such as the chief executive officer (CEO), chief financial officer (CFO), various corporate vice presidents, and others, at public US companies from 1920 to 2011. We combine information from a number of sources.
First, we hand-collect names of corporate officers and directors, as well as financial data on their firms, from Moody's Industrial Manuals (“Moody's”) from 1920 to 1992 and also the year 1998. Second, we collect names of corporate directors and officers from Compact Disclosure during 1985-2005. Compact Disclosure derives information from firms’ public disclosure such as 10-Ks, 10-Qs, and proxy statements. Third, we supplement these two primary board and officer databases using Mergent (which took over the Industrial Manual from Moody's, 2002-2009) and Board Analyst (2005-2011) for more recent years. We gather stock price and return data from CRSP and financial statement data from Compustat or Moody's Industrial Manuals (for firm-years not covered by Compustat). Like most corporate finance research, we do not include firms in the financial (SIC 6000-6999), transportation (4000-4599), and utility (4900-4999) sectors.
To maintain comparability across the various databases and years, we focus on US firms listed on the NYSE or Amex. Our main results are similar if we include American depository receipts (ADRs) and Nasdaq firms or if we use NYSE firms only (see Online Appendix Table 1, Panel D). We have CEO and board information for nearly 80% of the NYSE/Amex firms in the CRSP database over the 1920-2011 period. All total, our database contains 86,946 firm-year observations and more than 8,500 CEO turnover events.
At times, you might wish to leave your documentation from your actual paper for various reasons. In this example, it is found under supporting information which is separate from the actual journal article:
In this Appendix, we provide a narrative account of how we progressed with our data analyses to generate the reported themes through triangulation. As we described in the Research Context and Methods section, data analyses began concurrently with data collection (Glaser and Strauss, 1967). We first examined and analyzed Uber’s pitch deck (Appendix Figure 1), and read and coded news media and extensive accounts about Uber by respected journalists and researchers (e.g., Stone, 2017; Rosenblat, 2018; Isaac, 2019). A complete list of data sources is summarized in Table 1 (Data Sources).
In this example, the original source is in Chinese (i.e. Hurun Rich List, similar to the Forbes Billionaires list but for people in China):
From the Rich List published between 1999 and 2012, we manually collected the identities of the wealthy people or families. Based on information from firms’ IPO prospectuses, public-listing announcements, annual reports, and the Web, we then determined the public firms that the entrepreneurs of interest ultimately control. We denote as T the event year when the controlling owners are first included in the Rich List. We do not consider cases in which entrepreneurs’ rich listings precede their firms’ going public; for such cases, the firms are in the private status before T and data are not publicly available. (Wu & Ye, p.16)
This work is licensed under a Creative Commons Attribution NonCommercial 4.0 International License. | Details and Exceptions