Evaluation Resources Frequently Asked Questions
The information below is meant to provide guidance for those thinking about evaluation and is based on questions received from the training community. It is not intended to? apply to every type of evaluation need and is not a comprehensive list of all questions related to evaluation. Rather, it can be a starting point for those considering different elements of evaluation. Additional information will be added periodically. Relevant resources are listed after some questions. These are free resources, unless otherwise noted. Please contact us if you have any other questions. Visit our Evaluation Resources page for a listing of more resources.
NIGMS did not create the content on the external sites linked below, isn't responsible for their availability or content, and does not endorse, warrant, or guarantee the products, services, or information described or offered at these sites.
Survey Develop?ment
What are different types of evaluation?
Evaluation determines the extent to which a program has achieved its goals or outcomes. It may utilize an assessment as a tool to measure aspects of the evaluation or include research questions that the team wants to answer. There are several types of evaluations and to choose the best method for your program, it is necessary to understand the differences between evaluation types; below is a potential resource.
Relevant Resource:
Is there a "sweet spot" for the number of survey questions to get optimal response rates?
Survey length varies depending on the needs of the program or evaluation; however, surveys should not be longer than necessary, to reduce the burden on participants and to increase the likelihood of completion.
In addition to considering the number of questions to include in a survey, consider how much time the survey will take to complete. For example, surveys with many straightforward, binary-type questions may take less time to complete than a survey with fewer questions that are multiple choice or short-answer. An overly lengthy survey may lead to lower response rates. The following resources may be of use in constructing survey questions.
Relevant Resource:
- (Paid-for/ subscription-based resource) Designing Quality Survey Questions, by Sheila Robinson. (2024). Los Angeles: SAGE, 2024.
- Organizational Research: Determining Appropriate Sample Size in Survey Research [PDF]. Bartlett II, JE, Kotrlik JW, Higgins CC. (2001). Information Technology, Learning, and Performance Journal, Vol. 19, No.
- Six Rules of Thumb for Determining Sample Size and Statistical Power [PDF]. The Abdul Latif Jameel Poverty Action Lab (J-PAL). (2018).
Should incentives be provided for survey completion?
Programs must consider institutional review board (IRB) protocols when designing incentives, as some IRB protocols include guidelines on survey incentives.
Response rates may be improved with incentives; however, offering large incentives could be considered coercive. Generally, it is important to acknowledge participants for their time completing the survey while avoiding causing participants to feel obligated to complete surveys due to monetary rewards that they may not have received otherwise.
Relevant Resource:
- Incentives for survey participation — when are they "coercive"?. Singer E, Bossarte RM. (2006). American journal of preventive medicine, 31(5), 411–418. https://doi.org/10.1016/j.amepre.2006.07.013
How should evaluation teams prepare for an open response survey?
Open response, or open-ended, questions on a survey allow respondents to explain their answers, provide examples, and expand on their thinking. Such questions should be clear in wording and in design. It is recommended that survey developers avoid writing multi-part questions to avoid delays in analysis, and instead keep each question to a single topic.
Survey developers should consider the additional time that will be needed to analyze open response surveys. Responses can be qualitatively coded either manually or using licensed software such as Nvivo and Atlas.ti. It is helpful to develop a codebook in which codes are linked to the various metrics and constructs of a program. Open-response questions can unveil prominent themes and topics.
Sample Sizes
How do you work with small sample sizes?
Small sample sizes may create some difficulties and limitations in program evaluation efforts. Planning for a small sample size during the planning phase can be helpful for various statistical analyses and reporting and overall approaches to addressing imbalance. Before implementing a survey, consider the minimum sample sizes that will be needed to answer questions of interest, and determine whether the survey population will yield the necessary number of responses.
Relevant Resource:
How do you work with large sample sizes?
Although a larger sample size may more accurately reflect an intended population, larger sample sizes may come with more data cleaning and other stratification to identify the criterion of importance.
Personnel
Can an external evaluator be hired?
Hiring an external evaluator may streamline the evaluation effort. Consider the costs of hiring an evaluator, and what may be allowable through your grant funds versus from university or institutional resources. Evaluation is considered an "allowable cost" for many grants, and funds within budget may be used to defray the cost of the evaluation. Institutional support is expected to contribute toward the cost of evaluation. Please consult the relevant notice of funding opportunity for guidelines and contact your grants management specialist and/or program director with questions.
Programs can be very complex. How can an external evaluator understand the nuances?
To conduct an evaluation, it is important that the evaluator has knowledge of the program goals, allowing for an effective mapping of the evaluation to the goals and topics of importance. The program goals should be clearly defined and measurable to facilitate their use in an evaluation. Program teams are expected to meet with the evaluator(s) to discuss goals and to ensure the evaluator has a thorough understanding of the program. It is imperative that the evaluator/evaluation team protect sensitive information.
Must an external evaluator be from outside the program's institution?
An evaluator external to the program may bring greater expertise and less bias to an evaluation; however, it is not always necessary to hire an external evaluator. Program teams may consider working with evaluators internal to their institution who are not affiliated with their program implementation; thus "external" to the program while remaining local to the institutional environment.
Programs should also consider costs of working with external evaluators and talk to their grants management specialists and/or program directors with questions.
Can evaluation be done entirely on-site?
Formative evaluations can be done on-site. Conducting a formative evaluation is a way to gain data that can be used, for example, when applying for grants, conducting an institutional self-assessment, or refining program aspects. However, working with external evaluators* to assess program outcomes will help distance the evaluative findings, and avoid potential bias (and the appearance of bias) in evaluations.
*This can be a person who is external to the program being evaluated while still being on-site. For example, staff from a different college at the same institution.
Rubrics and Metrics
Is there a repository of tools (e.g., research design and rubrics) that program implementers can use as resource to customize for specific goals, rather than having to develop these tools?
NIGMS does not provide rubrics for evaluation because every program is unique in their goals and implementation techniques. However, a wide variety of resources and validated instruments exist that programs can use, if appropriate. Depending on the goals of the program, the measures and rubrics needed to aid the evaluation will vary.
General evaluation tools can be found on the Better Evaluation website.
What are factors to consider when developing a scale?
There are several factors to consider when designing a survey, and selecting appropriate survey scales is an important aspect of the process. Different measures will require different scales. A common scale for use in surveys is the Likert scale. When possible, consider using validated measures and associated scales. Consistent scales throughout a survey may lead to more straightforward analyses. When constructing your own measures, discuss the scales and different options (e.g., including an N/A or neutral option, scales with odd numbers of responses) with your evaluator/ evaluation team. The resources below outline several conditions for various point scales.
- Question and Questionnaire Design [PDF].
- Likert-Type Scale Items [PDF]
- The Impact of "No Opinion" Response Options on Data Quality: Non-Attitude Reduction or an Invitation to Satisfice?
https://academic.oup.com/poq/article/66/3/371/1836194
Program Related/NIH
Is the goal of program evaluation to compare one's own program with others, or is the goal to compare outcomes within one's program or organization?
The goals and associated evaluations of each program vary. Each program is unique and creates training and mentoring activities for different populations; thus, direct comparison between programs may be problematic.
Evaluations help program directors and institutions measure progress toward their goals, as well as find potential areas for improvement. In addition, including evaluation data is useful for reporting and when applying for future grant awards.
Data from evaluations may also be used in manuscripts if the program staff are interested in writing about their outcomes, sharing recommended practices, or novel program ideas.
Note: programs must receive the proper clearance through their institutional review board (IRB) for evaluations, particularly for those that may result in the public release of data.
Is it critical to use common measures?
It is important to use common measures when comparing across cohorts and years.
Common measures can prove useful if an evaluator is interested in looking at longitudinal or cross-site analyses. Using the same measures over time can help to measure progress in a more standard way. However, evaluators should not feel the need to maintain common measures if the measures are outdated or no longer relevant (e.g., a question about a seminar that is no longer offered).
Developing common measures takes time and discussion between the evaluation team and the implementation team and should be undertaken before evaluation begins.
Is there a way to ensure that data can be used, shared, and compared across institutions?
There are many factors to consider when planning to share data across institutions. The team should first become familiar with the institutional review board (IRB) policies at their institution and for any of their partner/ collaborating institutions with whom they want to share/ compare data. Depending on local regulations and standards, the teams may be able to apply for a blanket IRB agreement among the participating institutions.
An NIH-funded study being conducted at more than one U.S. site involving non-exempt human subjects research may be subject to the NIH Single IRB policy and/or the revised Common Rule (rCR) cooperative research provision (§46.114). For more information, visit: https://grants.nih.gov/policy/humansubjects/single-irb-policy-multi-site-research.htm
If multiple institutions plan to share data, the teams should develop guidelines for proper use, storage, and access to the shared data through a Data Sharing Agreement (or similar document).
Relevant Resources
- Single IRB for Multi-Site or Cooperative Research
https://grants.nih.gov/policy/humansubjects/single-irb-policy-multi-site-research.htm
Can NIH provide guidance about what to measure, and what agency expectations for evaluation exist?
NIGMS does not provide guidelines or rubrics for use in evaluations to encourage creativity and independence in implementation and because all programs are unique. Programs should develop their evaluation based on their own program needs and interests. Program goals can be used to guide evaluation questions. Some examples of evaluation standards, effective practices and measures can be found in the resources on this page; however, these resources are developed by outside sources and are not endorsed by NIGMS.
Does NIH consider training program evaluation a form of Human Subjects Research?
No. Training grants prepare individuals for careers in the biomedical research workforce by developing and implementing evidence-informed educational practices including didactic, research, mentoring, and career development elements. While funded programs are expected to conduct ongoing program evaluations and assessments to monitor the effectiveness of the training and mentoring activities, training grants funds are not intended to support Human Subjects Research (see additional information on Human Subjects Research from NIH and HHS).
If an investigator wishes to conduct Human Subjects Research involving the trainees supported by the training program as research study participants, they must follow appropriate institutional policies (e.g., obtaining IRB approvals, consenting study participants).
Applicants are encouraged to reach out to Scientific/Research Contact listed in the funding announcement if there are any questions.
Are there consistent taxonomy examples/alumni career outcomes?
Taxonomy of trainee pathways may vary depending on the population and reporting needs. Evaluators can reference literature in their field to learn taxonomy standards. It is suggested that evaluation teams define terms that will be used at the beginning of the evaluation, and that they use clear and consistent taxonomy throughout the survey administration and data analysis processes.
Relevant Resources
- Evolution of a Functional Taxonomy of Career Pathways for Biomedical Trainees. Mathur A, Brandt P, Chalkley R, Daniel L, Labosky P, Stayart C, Meyers F. Journal of Clinical and Translational Science. 2018 Apr;2(2):63-65. https://doi.org/10.1017%2Fcts.2018.22
Are there ways to address a lack of alignment between the proposal and what comes out in the assessment?
Sometimes, external forces necessitate changes to proposed plans, and program teams can face these changes thoughtfully. Changes to implementation and unexpected training outcomes can be described in progress reports and proposal renewals. Conducting a well-planned evaluation and subsequent analysis can help to determine why teams might see outcomes that differ from expectations. If program goals are not being met, well-designed evaluations should help to determine where to make refinements.
Are there universal assessment guidelines or frameworks that can be customized?
NIGMS does not provide guidelines or rubrics for use in evaluations because each program is unique in terms of goals, context, student populations, etc. Teams who are interested in developing an evaluation can reference existing evaluation tools, such as those on this site as examples. The examples included on this site are not endorsed by NIGMS; rather, they are provided as references and resources teams can use when developing their evaluations.