Skip to Main Content

Business Subject Guide — Datasets

This guide identifies key resources for doing business research.

Browse Data by Concept

Google Dataset Search is a search engine from Google that helps researchers locate online data that is freely available for use.

There is no one-stop platform (yet) to search for business datasets.

How can you get started then?

  1. Have clarity on what is the scope, characteristics, & attributes of the data you're looking for.
  2. Look at existing research (e.g. market research reports, journal articles) to see where they obtained their data from.
  3. Consult your business librarian.
Note: All datasets in WRDS are licensed explicitly by the School of Business. To request additional products and subscriptions, please contact the Business Librarian. Registration is required for access to WRDS. For current UConn faculty, staff, and students, register for a WRDS account. Use your UConn email address.

Company Financials

Financial Markets, Prices, Returns

Stock Prices

Analyst Estimates

Indices and Factors

Bonds and Fixed Income

Mutual Fund / Hedge Fund / ETF Returns

Derivatives / Options

Currency Exchange Rates

Marketing

Mergers & Acquisitions

Microfinance

Productivity Statistics

Real Estate

U.S. Bureau of Labor Statistics (BLS)

Note that BLS does not publish detailed information on household income – which includes wages and salaries, interest, dividends, pensions, and income from other sources. Information on household income is available from the U.S. Census Bureau.

Notes

The concepts used above are adapted from WRDS and BLS.

Documenting your data sources

I don't usually recommend asking unless (a) you already have a relationship with the data provider (b) it's your last resort. The example below is a successful ask:

For the purposes of this study, the IRS generously provided access to one of the authors all Schedules M-3 for entities filing Forms 1120, 1120S, and 1065 for fiscal years 2008–2010 at the consolidated U.S. parent level. (Lisowsky. & Minnis, p.12)

Lisowsky, P., & Minnis, M. (2020). The Silent Majority: Private U.S. Firms and Financial Reporting Choices. Journal of Accounting Research, 1475-679X.12306. https://doi.org/10.1111/1475-679X.12306

Often, you should be prepared to consolidate data from various sources. This example discusses at length on where the data came from, how it was consolidated, and their end result:

We construct a comprehensive database of corporate officers and directors, such as the chief executive officer (CEO), chief financial officer (CFO), various corporate vice presidents, and others, at public US companies from 1920 to 2011. We combine information from a number of sources.

First, we hand-collect names of corporate officers and directors, as well as financial data on their firms, from Moody's Industrial Manuals (“Moody's”) from 1920 to 1992 and also the year 1998. Second, we collect names of corporate directors and officers from Compact Disclosure during 1985-2005. Compact Disclosure derives information from firms’ public disclosure such as 10-Ks, 10-Qs, and proxy statements. Third, we supplement these two primary board and officer databases using Mergent (which took over the Industrial Manual from Moody's, 2002-2009) and Board Analyst (2005-2011) for more recent years. We gather stock price and return data from CRSP and financial statement data from Compustat or Moody's Industrial Manuals (for firm-years not covered by Compustat). Like most corporate finance research, we do not include firms in the financial (SIC 6000-6999), transportation (4000-4599), and utility (4900-4999) sectors.

To maintain comparability across the various databases and years, we focus on US firms listed on the NYSE or Amex. Our main results are similar if we include American depository receipts (ADRs) and Nasdaq firms or if we use NYSE firms only (see Online Appendix Table 1, Panel D). We have CEO and board information for nearly 80% of the NYSE/Amex firms in the CRSP database over the 1920-2011 period. All total, our database contains 86,946 firm-year observations and more than 8,500 CEO turnover events.

Graham, J. R., Kim, H., & Leary, M. (2020). CEO-BOARD DYNAMICS. Journal of Financial Economics, S0304405X20301227. https://doi.org/10.1016/j.jfineco.2020.04.00

At times, you might wish to leave your documentation from your actual paper for various reasons. In this example, it is found under supporting information which is separate from the actual journal article:

In this Appendix, we provide a narrative account of how we progressed with our data analyses to generate the reported themes through triangulation. As we described in the Research Context and Methods section, data analyses began concurrently with data collection (Glaser and Strauss, 1967). We first examined and analyzed Uber’s pitch deck (Appendix Figure 1), and read and coded news media and extensive accounts about Uber by respected journalists and researchers (e.g., Stone, 2017; Rosenblat, 2018; Isaac, 2019). A complete list of data sources is summarized in Table 1 (Data Sources).

Garud, R., Kumaraswamy, A., Roberts, A., & Xu, L. (2020). Liminal movement by digital platform‐based sharing economy ventures: The case of Uber Technologies. Strategic Management Journal, smj.3148. https://doi.org/10.1002/smj.3148

In this example, the original source is in Chinese (i.e. Hurun Rich List, similar to the Forbes Billionaires list but for people in China):

From the Rich List published between 1999 and 2012, we manually collected the identities of the wealthy people or families. Based on information from firms’ IPO prospectuses, public-listing announcements, annual reports, and the Web, we then determined the public firms that the entrepreneurs of interest ultimately control. We denote as T the event year when the controlling owners are first included in the Rich List. We do not consider cases in which entrepreneurs’ rich listings precede their firms’ going public; for such cases, the firms are in the private status before T and data are not publicly available. (Wu & Ye, p.16)

Wu, D., & Ye, Q. (2020). Public Attention and Auditor Behavior: The Case of Hurun Rich List in China. Journal of Accounting Research, 1475-679X.12309. https://doi.org/10.1111/1475-679X.12309
Bae, K., & Kim, D. (2020). Liquidity risk and exchange-traded fund returns, variances, and tracking errors. Journal of Financial Economics, S0304405X20301276. https://doi.org/10.1016/j.jfineco.2019.02.012 – see 2.1 ETF data (pp. 8-10)
Golubov, A., Lasfer, M., & Vitkova, V. (2020). Active catering to dividend clienteles: Evidence from takeovers. Journal of Financial Economics, S0304405X20300830. https://doi.org/10.1016/j.jfineco.2020.04.002 – see 3.1 Sample (p. 8)