Collecting the Data: The Nuts and Bolts

Recent implementation of the PPACA now mandates certain standards for data collections in all national population health surveys. In addition the National Research Council of the National Academies report Eliminating Health Disparities: Measurement and Data Needs (2004) recommends that hospitals, other health care providers, and health insurers collect standardized data on race and ethnicity using the Office of Management and Budget (OMB) standards as a base minimum. However, experts recognize that greater detail or granularity beyond the OMB categories may be more useful for hospitals and health care organizations in target improvements for diverse populations. We recognize that collecting granular level data at the organizational level may create challenges for reporting or for research. The Institute of Medicine's (IOM) recent report Race, Ethnicity and Language Data: Standardization for Health Care Quality Improvement (2009) provides new recommendations to help facilitate and further standardize the collection of race, ethnicity and primary language data. We recommend that health care providers collect race, Hispanic ethnicity and granular ethnicity data separately and "roll up" or aggregate the granular ethnicities to the OMB race and Hispanic ethnicity categories as needed.


    We recommend collecting race, ethnicity and disability information directly from patients or their caregivers. This information should be collected only once and periodically validated, preferably in two year intervals. Repeated collection should be avoided to reduce the burden both for patients and for staff responsible for collecting the information. Once this information is collected, it should be stored in an electronic format when possible.

    In addition, if a patient refuses to answer questions, the registration staff should move on with the registration process and record "declined" in the field indicating that the patient did not want to answer this question. Providing information is completely voluntary, and staff should recognize when people feel uncomfortable or explicitly state that they do not want to respond to these questions.

    We have designed this Toolkit to serve as a resource for hospitals and health care organizations. The primary components of race and ethnicity data collection that should be considered standard practice include the following:

    1. Collect data directly from the patient or from a designated representative.
    2. Provide a rationale or reason for why this information is being collected.
    3. Depending on the capacity of your organization, decide whether you will be providing broad or granular categories. If using predefined categories, decide whether you will be using the bare minimum, such as OMB, or whether you will be providing more granular categories. (Information about both broad categories and granular categories is listed in the section below, "Which Categories to Use.")

    Hospitals, Clinics, Group Practices

    We recommend that this information be collected at the time of patient registration for hospitals, clinics, and medical group practices. This information can be collected face-to-face or over the telephone.

    Health Plans

    For health plans and insurers, we recommend that this information be collected at the time of enrollment, if possible. We realize that this may pose a challenge as some employers prohibit asking this information of their employees. America's Health Insurance Plans (AHIP) has developed a toolkit, "Tools to Address Disparities in Health: Data as Building Blocks for Change—A Data Collection Toolkit for Health Insurance Plans/Health Care Organizations (PDF)."


    Always provide a rationale for why you are asking patients/enrollees to provide information about their race/ethnicity. Research shows that patients are most comfortable providing this information when told why it is being collected and how it will be used. We recommend that health care organizations and health plans collect this information for quality monitoring purposes. Below is a sample rationale, which is easy to communicate and focuses on data collection for quality monitoring.


    "We want to make sure that all our patients get the best care possible. We would like you to tell us your racial/ethnic background so that we can review the treatment that all patients receive and make sure that everyone gets the highest quality of care."

    In addition, it is important to state that the information is confidential:

    "The only people who see this information are registration staff, administrators for the hospital, and the people involved in quality improvement and oversight, and the confidentiality of what you say is protected by law."


    Provided below are the PPACA requirements and CDC Race and Ethnicity Code Sets (granular categories that can be rolled up into the PPACA-recommended OMB categories for reporting or research purposes). As indicated, hospitals can choose to present patients/enrollees with a list of either broad or granular categories allowing patients/enrollees to self-identify their racial/ethnic background.

    Broad Categories (OBM)

    PPACA Standards (2011)

    The passage of the PPACA includes requirements on the enhanced collection and reporting of data on race, ethnicity, sex, primary language, disability status. For detailed information about the PPACA standards, go here.

    The PPACA revised standards includes separate race and ethnicity questions. See below for specific PPACA recommendations.

    First ask questions about ethnicity.

    Rolling Up Granular Categories into Broad Categories

    Field research by HRET has shown that some health care organizations have only one field (for race) and do not have a separate field for ethnicity. Although collapsing data should be avoided, in cases where the collection of race, ethnicity and granular ethnicity is not captured you should follow the IOM (2009) recommendations for granular ethnicity.

    Granular Categories

    In addition to collecting data in the PPACA race and ethnicity categories, organizations should also collect granular ethnicity data using categories that are representative of the population served. The IOM Subcommittee on Standardized Collection of Race/Ethnicity Data for Healthcare Quality Improvement recommends that granular ethnicity categories should be selected from a national standard set based on ancestry (e.g., Centers for Disease Control and Prevention [CDC]/Health Level 7 [HL7] Race and Ethnicity Code Set 1.0).

    Not all organizations collecting granular ethnicity data will need to include the entire national standard set of categories in their databases or on their data collection instruments. Rather, organizations should select categories from the set that are applicable to their service population. Whenever a limited list of categories is offered to respondents, the list should include an open-ended response option of "Other, please specify:__" so that each individual who desires to do so can self-identify.

    When respondents do not self-identify as one of the PPACA race or ethnicity categories and provide only a granular ethnicity response, a process for rolling the granular ethnicity categories up to the PPACA categories should be used. Ethnicities that do not correspond to a single PPACA race category should be categorized as "no determinate OMB classification".

    Centers for Disease Control Race and Ethnicity

    Code Set

    The U.S. Centers for Disease Control and Prevention (CDC) have prepared a code set for use in coding race and ethnicity data. This code set is based on current federal standards for classifying data on race and ethnicity, specifically the minimum race and ethnicity categories defined by the OMB described above and a more detailed set of race and ethnicity categories maintained by the U.S. Bureau of the Census. The code set can be applied in both electronic and paper-based record systems.

    Within the table, each race and ethnicity concept is assigned a unique identifier, which can be used in electronic interchange of race and ethnicity data. The hierarchical code is an alphanumeric code that places each discrete concept in a hierarchical position with reference to other related concepts. For example, Costa Rican, Guatemalan, and Honduran are all ethnicity concepts whose hierarchical codes place them at the same level relative to the concept Central American, which is the same hierarchical level as Spaniard within the broader concept Hispanic or Latino.

    In contrast to the unique identifier, the hierarchical code can change over time to accommodate the insertion of new concepts. For more information, see the two links below.

    Granular Code Set I (PDF)
    Granular Code Set II (PPT)

    IOM Subcommittee Proposed Template of Granular Ethnicity Categories

    The IOM subcommittee has also created a template listing granular ethnicity categories from multiple sources including the CDC/HL7 list. Some of the granular ethnicities included in the template have already been assigned permanent five-digit unique numerical codes by CDC/HL7. Others still require permanent five-digit unique numerical codes.

    IOM Subcommittee Proposed Template of Granular Ethnicity Categories

    Language Categories

    To simplify the collection of language data, most organizations should develop a list of common languages used by their service population, accompanied by an open-ended response option for those whose language does not appear on the list.

    Locally relevant language categories should be selected from a national standard set such as that available from the Census list or IOM report. A sample list is as follows:

    • Arabic
    • Armenian
    • Chinese
    • French
    • French Creole
    • German
    • Greek
    • Gujarathi
    • Hebrew
    • Hindi
    • Hungarian
    • Italian
    • Japanese
    • Korean
    • Laotian
    • Miao Hmong
    • Mon-Khmer Cambodian
    • Other native North American languages
    • Persian
    • Polish
    • Portuguese
    • Portuguese Creole
    • Russian
    • Scandinavian languages
    • Serbo-Croatian
    • Spanish
    • Tagalog
    • Thai
    • Urdu
    • Vietnamese
    • Yiddish
    • Availability of Sign Language or other auxiliary aids or services
    • Other, please specify:___
    • Do not know
    • Unavailable/Unknown
    • Declined

    IOM Subcommittee Template of Spoken Language Categories and Coding (Table I-1 in Appendix I of IOM Report)