← Back to AI in Law
The Challenge

In the era of Generative AI, traditional vendor vetting hasn't become obsolete; it has become insufficient. Unlike a cloud storage provider that simply holds your documents, an AI model has the potential to consume them. This guide outlines the specific, heightened due diligence framework law firms must apply in 2026.

For the last decade, "due diligence" meant asking if a vendor encrypted data at rest and had a disaster recovery plan. AI demands those same controls, plus an entirely new layer of scrutiny regarding data provenance, model behavior, and permanence.

While enterprise-grade tools (unlike their consumer counterparts) generally default to "no training," the nuance lies in the details: context windows, retrieval databases (RAG), and third-party subprocessor chains.

Why AI Due Diligence is Different

Standard SaaS diligence focuses on availability and access. AI diligence must also focus on inference and integrity.

  • The "Black Box" Problem: In standard software, code is deterministic (Input A = Output B). In AI, outputs are probabilistic. You cannot fully audit the code, so you must audit the controls and policies surrounding it.
  • The Subprocessor Chain: Most legal AI startups are "wrappers" around major models (OpenAI, Anthropic, Google). You aren't just vetting the startup; you are vetting their API agreement with Big Tech.
  • The "Unlearning" Challenge: If a model is accidentally trained on sensitive client data, removing that specific data point is technically difficult and often unreliable. Prevention is the only viable strategy.

The 30 Questions That Matter (The Master Checklist)

Do not let a sales rep dazzle you with a slide deck. Send this questionnaire to their CISO or CTO.

Category A

Data Privacy & Training (The "Headline" Risks)

  1. Does your model train on our data?Note: Most enterprise APIs do not by default, but you need a binding "No Training" covenant in the contract, not just a verbal assurance.
  2. Does the "No Training" policy apply to all subprocessors?
  3. What is your specific retention window for prompts and outputs?Ideal: 0 days for content; <30 days for abuse logs.
  4. Do you use a Vector Database (RAG) to store our documents for retrieval? If so, how is that database segregated (e.g., logical separation vs. distinct encryption keys)?
  5. Can we opt out of "Abuse Monitoring" human review?Enterprise agreements often allow this; consumer terms do not.
  6. How do you handle "Right to be Forgotten" requests if data is embedded in vector stores?
Category B

Security Basics (The "Boring" Essentials)

Never skip these just because it's "AI".

  1. Do you support SSO (Single Sign-On) with MFA enforcement?
  2. Do you offer SCIM provisioning for automatic offboarding of staff?
  3. Do you provide admin-accessible audit logs?We need to see who prompted what.
  4. Do you hold a current SOC 2 Type II or ISO 27001 certification?Note: For local/client-side tools that don't process data on their servers, this question is less relevant. Instead ask about update signing, release integrity, and supply chain security.
  5. Do you conduct regular penetration testing?
  6. Have you performed specific "AI Red Teaming" (testing for prompt injection and jailbreaks)?
Category C

Accuracy & Legal Utility

Unique to Legal AI.

  1. What measures do you take to minimize hallucinations?
  2. Does your tool provide citations to uploaded documents for every factual claim?
  3. What is your policy on "unsupervised legal conclusions"?Does the UI warn users to verify output?
  4. How often is the underlying model updated?
Category D

Legal, Compliance & eDiscovery

  1. Will you sign a Data Processing Addendum (DPA) that explicitly forbids training?
  2. Do you indemnify us against IP/Copyright claims if the AI generates infringing content?
  3. Can we export prompts and outputs for eDiscovery or Legal Hold purposes?
  4. Are you the "Provider" or "Deployer" under the EU AI Act?If a Provider, do you publish the required training data summary?
  5. Where are your servers located?Data residency checks for GDPR/CCPA.

Red Flags and "Non-Answers"

Watch out for evasive responses that conflate consumer and enterprise policies.

The Question The Red Flag Answer What It Really Means
"Do you train on our data?" "We don't use your data to improve our services for other customers." They might still use it to train a model specific to you, creating a permanent copy of the data on their infrastructure.
"Do you store prompts?" "We use industry-standard encryption at rest." They are dodging the retention question. Stored data, even encrypted, is discoverable.
"Who sees our data?" "Access is limited to authorized personnel." "Authorized personnel" often includes low-wage contractors reviewing logs for safety.
"Are you SOC 2 compliant?" "Our cloud provider (AWS/Azure) is SOC 2 compliant." The vendor itself has no certification; they are relying on their host's security.

Sample Contract Language to Request

Standard terms often favor the vendor. Push for these clauses:

Recommended Contract Clauses
No Training Covenant: "Vendor represents and warrants that Customer Data (Inputs and Outputs) shall not be used to train, fine-tune, or materially improve any Artificial Intelligence models, regardless of whether such models are proprietary to Vendor or third-party providers."
Data Segregation: "Customer Data stored in vector databases or retrieval systems must be logically segregated from other tenants and encrypted with a unique key or tenant-specific salt."
Audit Rights: "Customer shall have the right to request audit logs detailing the access and deletion of Customer Data upon termination."

Internal Governance: Who Signs Off?

AI procurement requires a multidisciplinary sign-off.

1.
CISO / IT Director
Verifies the controls (SSO, SOC 2, Pen Tests).
2.
General Counsel / Ethics Partner
Verifies the obligations (Client consent requirements per ABA Op. 512, Privilege risks).
3.
Practice Group Leader
Verifies the utility (Does this tool actually improve work product, or does it just add risk?).

The Vendor Scorecard Template

Copy this framework for your internal evaluation.

Vendor Evaluation Scorecard
Vendor
___________________
Model Backbone
(e.g., Azure OpenAI / Anthropic)
Criteria Pass / Fail Notes
No-Training Guarantee [ ] Pass Must be explicit in DPA.
Retention Policy [ ] Pass 0 days for content preferred; logs must be limited.
Human Access [ ] Pass Opt-out of human review required.
Subprocessors [ ] Pass List provided; no "Consumer" APIs used.
Security Basics [ ] Pass SSO, MFA, SOC 2 Type II verified.
Accuracy Controls [ ] Pass Citations/Grounding features present.
Indemnification [ ] Pass Covers IP/Copyright claims.
Final Recommendation
[ ] Approve [ ] Reject [ ] Approve with Custom DPA

FAQ

Is a SOC 2 report enough?

No. SOC 2 measures security controls (firewalls, access policies), not data usage. A vendor can be SOC 2 compliant and still train on your data if their policy allows it. You need both the cert and the contract.

What is the EU AI Act "Transparency" requirement?

If a vendor provides a General Purpose AI (GPAI) model, they may be required to publish a summary of the content used for training. Ask if they comply with the EU's template for this summary.

Do we need a DPIA?

Under GDPR, a Data Protection Impact Assessment (DPIA) is required if processing is "likely to result in a high risk" to individuals. Processing sensitive client data via AI often meets this threshold.

Can we just redact names and use a cheaper tool?

Redaction is risky. Contextual clues (e.g., "A $5B merger in Cupertino") can reveal identities to an AI. It is safer to use a secure vendor than to rely on perfect manual redaction.

A Note on "Local" Alternatives

If the due diligence process for cloud AI proves too complex or risky for certain highly sensitive matters, consider Local / Client-Side AI.

Tools like inCamera run entirely on your own infrastructure (or laptop), meaning prompts and documents are never sent to the vendor's servers. This shifts the security model entirely: data handling compliance (SOC 2 Type II for data custody) becomes less relevant when data never leaves your machine.

The relevant questions for local tools are different: supply chain integrity (Is the update pipeline secure? Are releases signed?), minimal telemetry (What data, if any, is sent for licensing or analytics?), and update delivery (Can a compromised update inject code?).

Verification: The Allowlist Firewall Test

For skeptical reviewers, this test is often more convincing than packet captures:

  1. Temporarily block all outbound traffic except: AI provider domains, Apple notarization endpoints (if applicable), and the vendor's update/licensing endpoints.
  2. Confirm the app still performs AI calls normally, without access to other servers, especially the vendor's own infrastructure for content.
  3. On macOS, tools like Little Snitch or LuLu can create this ruleset with exportable logs for your records.

Skip the Complex Vetting Process

inCamera's Zero Data Retention architecture means your prompts never touch third-party servers. No subprocessor chains. No abuse monitoring logs. No complex DPAs.