Skip to main content

Statistics & Data

Statistical and data resources available at Saint Mary's University, including Statistics Canada, business and international sources

NEW Statistical Database

  • Statista
    Statista is a easy-to-use portal for statistics and industry reports. It includes 600 industries, 18 consumer markets, and 9 digital markets. Market and industry sectors covered include: retail, consumer goods & services, media, sports & recreation, technology, transportation and logistics, and travel and tourism. Users can export data to Excel and graphs to PowerPoint. Infographics are also available for download. Statista Quick Guide

Statistics & Data: Overview

The terms statistics and data are often used interchangeably. Although there are some commonly understood distinctions, there are also grey areas: statistics are a kind of data, and data are used to generate statistics. There are some distinctions between the two terms:

Statistics often are:

Data can generally be used to:

  • facts or figures
  • time series
  • tables, charts, or graphs
  • to support an arguement
  • "ready to use"
  • calculate statistics
  • test hypotheses
  • generate custom tables
  • look at responses of individuals
  • analyze, using SPSS, Stata, or SAS software
  • do Regression, t-test. ANOVA. etc.

When thinking about using data, consider what type type of data you may need - microdata or aggregate data. Microdata is the original, (mostly) unprocessed information* such as the level of education attained by each member of a household, the height and species of each tree in a park. Aggregate data is summarized and combined in some way: average education levels in a city, number of oak trees in a city park.

*(microdata <i>is</i> processed a bit, to protect privacy of participants, but respondent answers remain substantially intact)

Content adapted with permission from Brock University Library

What's going on with these numbers?

First things first: slow down. Don't focus on the numbers in the table right away. Instead, carefully review the details around the edges: what information is given by the title or header? What are the row and column labels? Are there any footnotes or references underneath the table? All of this information can help you understand the context of the numbers that are inside the table.

Questions to ask (and answer!) when looking at numerical data or statistics:

  • What's being counted or summarized?
  • What units are being used: thousands of dollars (CAD? USD?), individual spectators, percentage change (from what?), percentage of total, etc
  • Who collected and/or summarized the data?
  • What questions were asked or what sources used to find, solicit, compile, collect, create the data?
  • What was the purpose of collecting the data in the first place?
  • How does all of that fit with what YOU want to do with these numbers?

Limitations to keep in mind

Many factors can affect what data is collected and why, and other factors affect what can be shared with others. A few common issues that arise with published data and statistics include (1) the need to protect privacy, 2. the effort to control for accuracy and precision, 3. mandated measurements (such as the census), and 4. pre-existing categories with which to organize the data.

  1. Protection of Privacy:
    • Because of privacy concerns, some data may be restricted because the population being counted is so small, it would be possible to identify an individual person or business.
  2. Efforts to control accuracy and precision
    • Data may not be available when there are concerns regarding data collection methods, or where it is not possible to confirm the accuracy of the data. Some statistical calculations require specific criteria to be considered valid: for example, if the number of data points is too small, or if the method of obtaining the data was inconsistent, the statistic calculation isn't considered to be accurate and may not be published.
  3. Comparison over time is often not possible
    • Many surveys, including the national census, are required by law or regulation; in some cases, the specific questions and responses collected are explicitly outlined by a government agency or ruling. These regulations can change over time, so the questions asked 10 or 20 years ago might not be the same as those asked today. Consequently, comparing data over time may be complicated or impossible.
  4. Standardized categories and methods may not match your requirement
    • Standardized methods and categories are often used by many groups to more easily share and compare data sets and statistics. It is convenient to use these standards, but they might not perfectly match your specific question.
      • For example, NAICS codes are commonly used in Canada and the US to collect economic and labour statistics based on industry. Each specialized industry will have a single NAICS code that is a subset of a larger category, which is in turn part of an even larger category, etc. The hierarchical arrangement defined by NAICS might not always fit the way you would like to categorize the industry. These codes can also change over time, something to keep in mind if you are looking at statistics from different decades: NAICS 1997 had 3 codes for internet-related industries. NAICS 2012 has 57.
 

Aggregate Data:

  • Consist of statistics that are organized into a data structure and stored in a database or in a data file. The data structure is based on tabulations organized by time, geography, or social content.

Beyond 20/20

  • Software that allows the user to manipulate a pivot table to create and reshape a data file.

Data

  • Data are numeric files created and organized for processing and analysis. There are two types of data – aggregate and microdata.  Aggregate data and microdata offer the user more control over the variables offered for analysis.

DLI

  • Data Liberation Initiative - a partnership between Statistics Canada and Canadian post-secondary academic institutions to promote and facilitate the availability of Statistics Canada and other Canadian data for educational and academic research purposes.

GeoSpatial Data

  • Defines a geographic area and its dimensions. These are in the form of points, lines, polygons or pixels.

Master files

  • "Pure" or "Raw data set, consisting of all respondent answers to survey questions. [Almost] all variables and cases are available for analysis in the master file. The master file is not available to all users. 

Metadata

  • Metadata is the documentation that accompanies and assists users in the interpretation of microdata, aggregate data and geographic files.

Microdata:

  • Raw or minimally processed research data (also known as survey data). Each line in the file contains information on a person (or other unit of observation). It is typically used in SPSS, Stata, R, or other statistical software.

PUMF

  • Public use microdata file - A master file that has undergone modification to minimize the possibility of disclosing a respondent's identity.

Statistics

  • Tables and cross-tabulations that have been formulated from the raw data files (e-publications, e-tables, and databases).

Statistical Analysis Software

  • Can take data from almost any type of file and use them to generate tabulated reports, charts, and plots of distribution and trends, descriptive statistics and complex statistical analyses.
    • Examples include: SPSS (Statistical Package for the Social Sciences), SAS (Statistical Analysis System) and STATA.
    • Programming languages used to perform statistical analysis include R and Python;
    • Some statistical analysis can be performed in Microsoft Excel.

Source: DLI Survival Guide

This guide provides starting points for finding statistics and data. Some resources provided are limited to the use of the Saint Mary's University community. Other data sources listed here are open access and freely available online. 

Talk to us! We can help you navigate various statistics and data tools, such as the Statistics Canada collections, industry and company performance data sources, and more . Drop by our Research Help Desk, or make an appointment to discuss your project and data requirements.

The Library participates in Statistics Canada's Data Liberation Initiative (DLI) and can provide access for research purposes to Public Use Microdata Files (PUMFs) from many of Statistics Canada survey programs. Researchers requiring access to StatCan PUMFs, databases and geographic files should contact the Library’s DLI representative, Joyce Thomson at 902-420-5549, or email at joyce.thomson@smu.ca

Loading ...

Library Information

Current Library Hours

Fall Term: Extended Hours: December 2 - December 17

Monday - Saturday: 8:00am - 1:00am

Sunday: 10:00am - 1:00am

***Circulation Services close at 11:00pm***

 

Loading ...