Booting out bias: How to derisk advanced analytics models in the public sector

(PDF-965 KB)

The thoughtful use of advanced analytics is transforming the efficacy and efficiency of public-sector agencies’ most critical work. For example, in the United States, it has helped the Internal Revenue Service (IRS) identify potentially fraudulent tax returns, helped the city of Las Vegas create targeted restaurant inspections by gathering information about public-health concerns from social media, and supported NASA’s Jet Propulsion Laboratory in generating weight-reducing designs for space exploration.

There are many other applications that could help agencies improve outcomes and streamline processes to make the best use of limited resources. Artificial intelligence (AI) and machine-learning (ML) models could help customs officials identify containers that might hold dangerous materials, bank regulators identify emerging risks in the banking system, and workforce agencies match job seekers to available jobs, to name a few. The possibilities will only grow as technology advances—but so could the risks.

While any model can be poorly designed or misused, AI and ML models carry additional potential risks from the complexity of their algorithms. These risks include the degradation of performance over time and lack of transparency as to how outcomes are produced. But perhaps the most pernicious risks associated with advanced analytics in the public sector are those of bias and discrimination, particularly against vulnerable sections of the community. Examples are not hard to find: advanced analytics models have been shown to sentence people of color more harshly, erroneously accuse low-income and immigrant families of fraud, and award lower grades to students from less privileged neighborhoods. This could help explain why many public agencies may be hesitant to deploy advanced analytics at scale. In fact, a recent report showed that 45 percent of US agencies were still only experimenting with advanced analytics and that just 12 percent were using highly sophisticated techniques.¹

This low adoption rate doesn’t necessarily equate with low risk, however, as it can mean that data scientists work without systems to ensure formal peer review or oversight of the models used, or that agency leaders may not have visibility into what AI work is being done within the organization and the potential risks involved.

The solution may partially be found in better model risk management. Here, we outline a best-practice approach to developing and monitoring algorithms that can help public-sector agencies harness the power of advanced analytics to deliver better public services, while mitigating bias and other forms of unfair treatment.

Where bias lies

Fair treatment is core to the mission of public agencies but can be hard to preserve if decisions are based on algorithms built upon biased data sets. Bias might creep in because the training data includes biased human decisions or reflects historical or social inequities. Or it can stem from flawed data sampling—the under- or overrepresentation of certain groups of people, for example. Unless models are developed carefully, they risk amplifying bias and disparities related to race, ethnicity, and socioeconomic status.

Lack of transparency can also exacerbate the problem. AI and ML techniques can make it hard to track how the underlying data drives model outputs, making it difficult to spot bias or unfairness. That presents a special challenge in many public-sector applications, in which transparency is a legal requirement. Even where it isn’t, the public expects agencies to be able to explain the technologies used, what drives the outcomes, and the oversight in place. Lack of such transparency can exacerbate mistrust of technology.

Despite these high risks, model-risk-management infrastructure in many public-sector agencies remains at a nascent stage compared with other high-risk sectors. In the financial sector, for example, US regulators have been applying risk-management standards for at least two decades and today place strong emphasis on AI-related risk,²See for example, OCC bulletin on risk modeling, Comptroller of the Currency Administrator of National Banks, OCC Bulletin 2000–16, May 30, 2000; “Sound practices for model risk management: Supervisory guidance on model risk management,” Office of the Comptroller of the Currency, OCC Bulletin 2011–12, April 4, 2011, occ.gov; “Supporting responsible use of AI and equitable outcomes in financial services,” speech by Governor Lael Brainard at the AI Academic Symposium hosted by the Board of Governors of the Federal Reserve System, Washington, DC, January 12, 2021, federalreserve.gov; Proposed rule changes SR-DTC-2017-008, SR-FICC-2017-014, and SR-NSCC-2017-008, Securities and Exchange Commission Release No. 34-81485, August 25, 2017, sec.gov; “Model risk management guidance," Federal Housing Finance Agency Advisory Bulletin 2013-07, November 11, 2013, fhfa.gov. driving companies to invest heavily and develop new tools to manage it.³For instance, McKinsey’s QuantumBlack group has proposed a risk-management protocol supported by a web application to help prevent unfair outcomes. See Imran Ahmed, Giles L. Colclough, and Daniel First, “Building fair and transparent machine learning via operationalized risk management: Towards an open-access standard protocol,” presented at the AI for Social Good Workshop ICML2019, June 15, 2019.

Such investments have demonstrably reduced AI bias. For example, a review by a bank’s validation team of a ML consumer credit model discovered that, while the model was good at predicting credit risk, the reasons given for granting or denying credit were unstable, resulting in similar customers being given different reasons for a credit denial. Retraining the model using different modeling techniques fixed the problem. And in the travel industry, a company’s model-management process discovered that an advanced analytics model offering customers targeted promotions discriminated against older people by giving them less valuable promotions. The development team was able to fix the problem and introduce procedures to test continuously for bias in production.

Legislation and guidelines are emerging that will undoubtedly help strengthen model risk management in the public sector too.⁴ Rather than wait on such developments, public-sector leaders can take important steps for agencies now to effectively address advanced analytics model risk.

A path forward

Six key actions will help mitigate the risk of bias, along with other risks associated with advanced analytics models.

1. Make someone responsible. A senior leader should be made responsible for model risk management. In financial institutions and other regulated industries, the responsibility typically sits with the chief risk officer (CRO). In federal agencies that lack robust enterprise-risk-management structures, it could sit with the chief information officer (CIO), chief data officer (CDO), or the senior-level executive responsible for governance and oversight of technology. It remains the responsibility of the most senior agency leaders, however, to help translate agency mission and values, such as equity and diversity, into guidance for AI risk-management leaders. This may require some in-depth discussions. For instance, if the aim is to avoid recruitment gender bias, would an algorithm be deemed fair if it selected an equal number of men and women for interview from resumes, ensured an equal chance of success in an interview, or ensured an equal number of men and women were hired?

Key analytical practices and standards
To mitigate the risks associated with advanced analytics, agencies could establish a set of practices and standards to which all adhere. To assess whether the ones you have in place might be adequate, consider the following questions:

Are you clear about the problem and how the analytical outcome will drive impact? It’s surprising how often analytics teams triumphantly deliver model outcomes that elicit a lukewarm response from business leaders because they can’t do much with them. Often, that’s because the team has asked the wrong question at the outset. The starting point should be a desired action—connecting Veterans with training opportunities that will improve their economic outcomes, for example. Working backward, models can then be built to answer a series of questions that will enable that action.

Have you assessed data sets and collection methods for privacy and bias? This is required as bias and other risks can enter the modeling process before modeling even starts as a result of data selection. A solution that assesses public-transit usage by leveraging a data set gathered by a smartphone app might be biased toward more affluent and younger residents, for example.¹

Do you understand the data? Most modeling errors can be traced to a lack of rigor in understanding the data. Analytics teams can consider over-investing in data ingestion and exploration and thoroughly understand the underlying processes that generated the data. Additionally, the team can be thoughtful about how to handle outliers and which features might be autocorrelated, “target leaks,”² or hidden indicators of protected classes,³ as all can lead to unforeseen outcomes.

Are others reviewing the model-building process? Countless decisions can influence the accuracy and bias of a model’s output, for example, what features to generate from the raw data; how to split the data into training, validation, and holdout sets; and which analytical approach/algorithm to use. Extensive consultation with agency staff (ideally via an analytics translator) and with peer data scientists can help avoid wrong turns that waste time or lead to failed projects.

Are you testing for resiliency before deployment? AI systems can often become too complex for the designer to imagine all the possible ways they might fail. Hence, algorithms should be rigorously tested to make sure they respond appropriately when subject to unintended inputs.

Are you measuring outcome and bias in production? Even the most rigorously designed AI algorithm can have unintended consequences. It’s therefore important to measure not only the intended outcome of the model but also any unintended bias in outcomes.

Are you documenting all of the above? Recording the assumptions and limitations of the data and models will preserve the logic behind the choices that were made and limit the amount of risk rework needed when models are retrained or adjusted. Data “nutrition labels,”⁴ model cards,⁵ and questionnaires are all valid means of documentation.

2. Develop and communicate a clear set of analytical practices and standards. Every agency should establish a clear set of analytical practices and standards that are codified, widely communicated, and adhered to. These could include clarity about the specific problem a model seeks to address, a rigorous peer review process, and an empirical review of outcomes to detect any unintended bias. One good practice to counter bias is to ensure diversity on analytics teams. Bias in training data and model outputs is harder to spot if no one in the room has the relevant life experience that would alert them to issues. For other good practices, see sidebar, “Key analytical practices and standards.”

3. Build a model-risk-management infrastructure. A guiding principle of good model risk management is to continuously challenge models—not as a remedial exercise, but from the very outset as they are built and implemented.

An effective model-governance program may include actions such as the following:

Clearly state what the organization defines as a model within the scope of the governance program. This both clarifies model governance and provides an opportunity to consider the risk management required for other decision-making processes, such as manual ones, that don’t qualify as models.
Create and maintain an inventory of models in use across the agency.
Develop and maintain the standard workflow for models to ensure the widespread adoption of best practices in data science and bias awareness/reduction.
Develop an approach for assessing the materiality of the model and the potential to cause harm in order to establish the level of oversight required. Some models may require an audit, as shown below.

4. Consider creating algorithm review panels. Just as agencies use acquisition review boards to assess acquisitions over a certain spend threshold, an algorithm review panel could be used to mitigate the risk of AI and ML projects that carry particularly high risks. The panel could include both technical and nontechnical leaders, and their role could include assessing the model’s potential impact on equity, stepping back to consider not only whether the intended model outcome was fair, but whether other stakeholders would view the actual outcome similarly. Taking the earlier example, would an AI recruiting model that strove to consider an equal number of resumes from men and women still be deemed fair by stakeholders if the numbers of men and women eventually hired were unbalanced? The panel could, in effect, take collective accountability for the algorithm, removing the burden from any single person.

For some particularly high-stake applications, it might be appropriate to engage outside academic and industry experts to audit models for bias. Such audits can help reveal hidden problems in systems, create transparency, and increase public trust in the government’s use of advanced models.

5. Consider appointing an ombudsman. An analytics ombudsman could act as a point of contact and spokesperson for external stakeholders who may want to raise issues. This construct is used at the IRS where the Office of the Taxpayer Advocate, an independent organization, helps ensure taxpayers are treated fairly, handling appeals on their behalf and reporting problems they might encounter.

6. Strategize at the enterprise level. Advanced analytics can be more effective when embraced agency-wide, rather than just by a handful of advocates. Having a centralized effort to systematically identify and prioritize the highest-impact use cases, support them with funding, and communicate progress, lessons learned, and professional standards across the entire agency can improve effectiveness. Greater visibility can encourage better peer review, promote the spread of best practices throughout the agency, and build momentum and demand for advanced analytics solutions.

Advanced analytics techniques provide an opportunity to transform public services. By applying practices to minimize the risk of bias and other forms of unfair treatment, leaders can help empower their institutions to adopt mission-enhancing AI and ML approaches while increasing public confidence in the government’s use of analytics to improve outcomes for all. That’s a prize worth winning.

Booting out bias: How to derisk advanced analytics models in the public sector

Where bias lies

A path forward

Key analytical practices and standards

Explore a career with us

Related Articles

A conversation on artificial intelligence and gender bias

Accelerating data and analytics transformations in the public sector

Redesigning public-sector customer experiences for equity