Back

Designing the Trust Layer for Brazil's
First Government-Scale AI Audit Platform

Year2017
CategoryML · Interaction Design · Gov-Tech · Trust UX
ClientBrazilian Federal Court of Accounts (TCU)
Designing the Trust Layer for Brazil's First Government-Scale AI Audit Platform case study visual

Project Overview

Sole designer embedded at Brazil's Federal Court of Accounts. I designed the input architecture and verifiable output interface for an ML-assisted audit platform that cut Audit Instruction writing time by 63% and compressed the federal audit cycle from 5.

Adapted the TCU's formal Audit Instruction templates into structured, validated input fields the ML could read consistently. Then designed a verifiable output interface — every AI-generated clause linked to its source law, article, and case precedent — so auditors could inspect and correct before accepting.

Reader brief

Brazilian Federal Court of Accounts (TCU) is not just another case study

Sole designer embedded at TCU. Designed the input architecture and verifiable output interface for an ML-assisted audit platform. 63% writing time reduction, 64% audit cycle compression, 90% rejection rate drop, 100% voluntary adoption. A shipped ML system from 2017 that proved upstream trust design years before the LLM era.

Phase 0109Executive / Recruiter Summary

Role, status, decision value, and impact in one scan.

Status: Shipped government-scale AI/ML platform, designed and launched in 2017-2018.

Daniel's exact role: Sole Product Designer embedded at TCU, responsible for discovery, workflow architecture, structured input design, verification and correction surfaces, and auditor validation with live cases.

Why it matters for the target audience: This proves I was designing AI trust problems at institutional scale before "AI trust UX" became a hiring category. The LLM era changed the interface pattern, not the core problem: users must act on outputs they cannot fully verify unless the product makes verification possible.

Strongest outcome: 63% reduction in Audit Instruction writing time, 64% audit-cycle compression, 90% downstream rejection reduction, and 100% voluntary pilot adoption.

Impact caveat: Keep the metrics tied to the pilot/post-launch reporting language already present in the project data; do not overstate them as Daniel-only causal proof.

Phase 0209The Setup

Investigators acting as transcription clerks.

Brazil's Federal Court of Accounts (TCU) runs Special Accountability Investigations — formal federal procedures for suspected misuse of public funds. In 2017, these averaged 5.5 years from case opening to final judgment. Longer than most government programs being investigated.

The investigators doing this work were highly trained legal analysts. Their actual mandate was to examine corruption, trace misappropriated funds, and issue binding legal Audit Instructions. Instead, they were spending months as document clerks — manually transcribing case data from scanned PDFs, cross-referencing three government databases by hand, and writing complex legal instructions in unvalidated Word documents.

90% of those instructions were rejected downstream by the Federal Public Ministry for data errors and missing citations. Each rejection restarted weeks of work.

I ran discovery with two groups. The Secex-TCE pilot team — the auditors who would use the system — worked directly inside the TCU's workspace, on live active cases. The CGU (Brazil's internal federal control authority) handled cases upstream before they escalated to TCU. Running sessions with both revealed something critical: the upstream handoff was broken long before any Audit Instruction was ever opened. Cases arrived at TCU incomplete, inconsistent, or missing documentation they'd need months later.

The problem wasn't speed. It was trust — and the trust problem started upstream.

Phase 0309How We Got There

Excavating what people had stopped seeing.

I didn't start with interface sketches. I started by getting in the room.

Laís (PO) and I ran multiple workshop series across both groups. We sat at auditors' desks. Watched them open active cases. Timed where they stopped, hesitated, switched windows, or reached for a Post-it. We used facilitated techniques — 5 Hats analysis, Lightning Decision Jams, dot-voting — not because the methods were magic, but because people who live inside a broken system for years stop seeing the breaks. Structured provocation makes the invisible visible.

By the end of the discovery series, we had catalogued over 60 distinct pain points. Three patterns cut across everything:

The transcription trap. Auditors spent the majority of writing time copying data they already knew existed in government databases — they just couldn't access it from within their workflow.

The legal citation minefield. Audit Instructions required precise citations to specific articles of multiple federal regulatory frameworks. A wrong article number, even with correct intent, triggered rejection.

The invisible queue. Cases were processed chronologically. A R$20,000 irregularity got the same queue position as a R$25M one.

These three patterns became the real design brief. Every product decision we made traces back to one of them. The workshop process also produced something harder to quantify: by the time we were building prototypes, the auditors were bringing their own live cases to test against. That shift — from skeptical subjects to invested co-designers — is why we hit 100% voluntary adoption at launch.

Phase 0409The Template as Design Material

What structure does an expert use — and can we make it machine-legible?

The existing Audit Instruction template (the Modelo Esquemático — the TCU's formal legal scaffold for accountability case write-ups) was a pre-existing document, not a design artefact. My contribution was adapting it.

Working session by session with the pilot auditors, I decomposed how expert legal minds actually structure their reasoning — what they decide first, what depends on what, where they need flexibility and where precision matters. Then I re-encoded that reasoning into structured, validated input fields the ML could read consistently.

The key design question wasn't "how do we display AI output?" It was: what structure does a human expert use when they think about this problem, and can we make that structure machine-legible without stripping the auditor's judgment?

This is the move that changed everything downstream. Clean, structured inputs meant the ML's outputs were more predictable. More predictable outputs meant auditors could verify them faster. Faster verification meant trust.

Phase 0509Design Decisions

The AI should draft. The human should always decide.

Four interaction decisions defined the platform. Each one removes a specific source of distrust — from the data layer upward to the correction surface.

Trust Architecture Note
If inputs are shaped before they reach the model [bold]→[/bold] Then outputs are more consistent and legally defensible [bold]→[/bold] As a result, downstream rejection rates fall and human corrective effort collapses. If AI suggestions are verifiable at the clause level [bold]→[/bold] Then expert users calibrate trust incrementally [bold]→[/bold] As a result, voluntary adoption follows without enforcement. If structured authoring replaces free-form prose [bold]→[/bold] Then compliance errors surface at creation time, not review time [bold]→[/bold] As a result, rework cycles disappear from the workflow.

Decision 01

Database Integration + Structured Entry

Auditors spent most of their writing time transcribing data they knew existed in government databases (SIAFI, CADIN, SICONV) but couldn't access from within their workflow. Manual entry introduced errors caught only weeks later in Federal Public Ministry review.

Government databases auto-populate case fields. Remaining manual inputs validate in real time — required fields, format checks, citation format. Errors surface at the point of entry, not downstream.

Auditors trusted the baseline data because it came from the authoritative source, not a colleague's copy-paste. Cognitive load shifted from data-entry vigilance to legal analysis.

90% drop in Federal Public Ministry rejections caused by data errors and missing citations.

Decision 02

Impact-Based Prioritisation

Cases were processed chronologically. A R$20,000 irregularity got the same queue position as a R$25M one, creating invisible priority debt across the entire Secretariat.

A dashboard surfaces cases by fiscal exposure and program scope, not arrival date. Auditors see at a glance which cases have the highest public accountability stakes.

Auditors stopped deciding "what to work on today" — that decision was made by the system. Analytical focus went to the cases where it mattered most.

Pilot team throughput on high-value cases increased while administrative noise reduced.

Decision 03

Structured Authoring with Right-to-Correct

Senior auditors would not sign their names to AI-generated legal text they couldn't verify. During prototype testing (Adobe XD), Secretariat Directors froze — the prototype couldn't render the citation verification flow interactively, so they couldn't inspect the reasoning behind the draft. An unverifiable output was professionally untouchable.

Every AI-generated clause is a collapsible reference block. Expand it: see the source law, the specific article, the case precedent. The auditor verifies before accepting — clause by clause, not document-wholesale. Rich text areas preserved for analytical nuance the ML can't capture.

Same Director who froze in prototype testing blushingly validated the live system. Same skepticism, different fidelity — different outcome.

63% reduction in Audit Instruction writing time. 100% voluntary pilot adoption — no enforcement required.

Decision 04

Ambient Workflow Visibility

Case status lived in private spreadsheets and corridor conversations. Managers had no pipeline visibility without interrupting investigators. Auditors context-switched constantly to give status updates.

A shared state machine dashboard shows case status, next steps, and blockers in real time. The audit trail logs every state change with timestamps and actor names.

Auditors stopped managing projects and returned to managing evidence. The system tracked state; the investigators tracked facts.

Managers gained frictionless pipeline visibility. Supervisors could identify bottlenecks without waiting for status meetings.

Interactive Methodology System

// Interact with filters or focus a canvas node to inspect its methodology parameters.

Phase 0609Key Decisions

The design choices that made the system usable by skeptical experts.

Structured inputs before generated outputs. I adapted the formal Audit Instruction template into validated fields so the ML received more predictable, machine-legible inputs.

Clause-level verification instead of document-level trust. Every generated legal clause had to expose its source law, article, or precedent before an auditor could accept it.

Right-to-correct as architecture. The workflow preserved expert judgment through editable clauses, rich text, and explicit review states instead of forcing an approve/reject binary.

Priority by public accountability risk. The queue surfaced fiscal exposure and program scope, not just chronological order.

Live-case validation. Workshops moved from abstract exercises to auditors testing the system against active cases, which changed skepticism into adoption pressure.

Phase 0709Roads Not Taken

What I rejected because it would have weakened trust, adoption, or feasibility.

A free-form AI draft generator. It would have looked impressive in a demo but failed the legal verification burden of real audit work.

A black-box confidence score. Confidence without inspectable evidence would have shifted risk onto auditors without giving them real control.

Chronological queue plus manual escalation. That preserved the existing priority debt and required humans to remember which cases mattered most.

End-of-process validation. Catching errors after weeks of work was the failure mode we were replacing.

Static Word output as the main deliverable. It would have recreated the old document workflow instead of turning the system into a shared, stateful workspace.

Phase 0809Boundary, Caveats, and What I Would Improve Now

The honest version of the proof.

Shipped/prototype boundary: e-TCE shipped into production at TCU. It is not a concept case study.

Technology boundary: This was a 2017-2018 ML-assisted audit platform, not a modern LLM product. The relevance is the trust architecture: structured inputs, verifiable outputs, correction rights, and expert adoption.

Impact caveat: The outcome metrics belong to the shipped pilot/platform context and should remain framed as project/pilot outcomes, not solo-designer causal claims.

What I would improve now: Add richer trust-calibration telemetry, expose source confidence and uncertainty states more explicitly, instrument correction patterns, document governance decisions, and create a public technical appendix showing the state model, validation rules, and handoff logic.

Phase 0909The Operating Way

Trust calibrated, not assumed.

During the final validation session, the Secretariat Director — a highly traditional and deeply skeptical legal authority — sat down to test the platform. He entered an active case's parameters into the input console. The AI generated the draft.

I watched him read it once, knit his brows, and open the source citation drawer. He cross-verified the references. He read it a second time.

This draft is better structured than what most of our junior auditors write after three weeks of work. It has captured the exact legal nuance I was looking for. This will turn months of my life into a single afternoon.

This was trust calibrated — not through blind faith, but through a transparent, inspectable, and correctable architecture.

The platform shipped into production. It was recognised by the Ministry as one of the year's landmark projects and added to the TCU institutional roadmap for expansion across all Secretariats. Post-launch data from the pilot Secex-TCE team: 63% reduction in Audit Instruction writing time · 64% compression of the audit-to-judgment cycle (5.5 years → ~2 years) · 90% drop in downstream rejections.

What this project proved

Input shaping beats output fixing. The cleanest way to fix an AI's output is to build an interface that structures how it receives the input. Verifiability is not a feature — it's the architecture. For high-stakes AI work, interactive fidelity of the trust mechanism is non-negotiable in prototype testing. A static mockup of an unverifiable AI output is functionally indistinguishable from a broken one.

The same upstream instinct — structure what the model receives, make what it produces inspectable and correctable — is what I apply to LLM interface design today.

Measurable impact
0%
Reduction in Audit Instruction writing time

Achieved by reshaping ML inputs upstream

0%
Compression of the audit-to-judgment cycle

5.5 years → ~2 years

0%
Drop in downstream rejections

Errors caught at entry, not in review

0%
Voluntary pilot adoption

No enforcement required

This draft is better structured than what most of our junior auditors write after three weeks of work. It has captured the exact legal nuance I was looking for. This will turn months of my life into a single afternoon.
Secretariat Director, TCU — Final validation session