You're spending too much on AI.

Not because compute is expensive. Because most teams are using frontier models on work that doesn't need them. Green AI is the engineering discipline for fixing that — six open principles for auditing, redesigning, and right-sizing AI workflows. Use them. Improve them. Build with them.

Read the principles Get Started

The framework

Open source•Applied•Current

Six principles for building AI that costs less and wastes less.

The principles are open source. The playbooks make them usable. The model guidance keeps the framework grounded in what is actually cost effective to run now.

Frontier LLM

Expensive per call

High output variability

Model quality outside your control

Risk of hallucination

Local LLM

Lower cost, still non-trivial

Slower inference

Data stays on-prem

Hallucination risk remains

Pure Deterministic

Near-zero inference cost

No output variation

Fully auditable

Fast

Default to Determinism

If it can be deterministic, make it deterministic.

Most AI workflows contain large amounts of work that never needed AI in the first place: parsing, formatting, routing, lookup, and conditional logic. Code handles those tasks more cheaply, more quickly, and with identical output for identical input.

Before routing work to a model, ask whether a function can do it. If the answer is yes, write the function. Reserve inference for the edge cases that actually require reasoning.

Benchmark: in the greenchemistry.ai rebuild, 90% of the original workflow became deterministic Python. Cost dropped from $5.00 per run to under $0.005.

Use the Minimum Sufficient Model

Match capability to task, ruthlessly.

The frontier model is rarely the right default. Structured extraction, classification, constrained generation, and predictable formatting often perform better on smaller models when speed, consistency, and cost are the real metrics.

The McLaren principle applies: you do not put a tow hitch on a Formula One car to take the garbage to the dump. Match the vehicle to the job, and keep frontier capability for the small share of work that genuinely needs it.

Starting hypothesis: most enterprise workflows need frontier capability on less than 15% of their tasks.

Audit Before You Automate

Map workflows before redesigning them.

AI systems often get built for the workflow as people imagine it rather than the workflow as it actually exists. Real audits expose redundant steps, hidden deterministic work, and unnecessary inference that was only there because nobody diagrammed the system first.

Use a determinism scorecard from 0 to 4: from fully deterministic work that should immediately become code to genuine frontier tasks that need explicit justification.

Typical finding: initial audits place 60–80% of organizational AI usage at score 0 or 1.

Measure Inference Cost as a First-Class Metric

Architecture-level commitment, not a cleanup task.

AI costs spiral when teams treat inference cost the way old cloud teams treated egress: a number they will optimize later. Later never comes. The workflows harden first and the waste becomes cultural.

Instrument cost and latency at the point of call. Track cost per workflow run, not just cost per API request. Review the trend alongside classification updates so cost discipline compounds over time.

Local-First Where Possible

Lower cost and lower exposure, when the workload fits.

Local inference eliminates per-token API costs for qualifying workloads and keeps sensitive data inside the organizational perimeter. For structured, high-volume tasks, that is often both the cheaper and safer architecture.

Apple Silicon is a strong current example because unified memory, efficient matrix hardware, and low idle power make intermittent inference practical. These are engineering claims, not brand claims.

Constraint: local-first is powerful, but only when the workload is sufficiently repetitive and can be handled by sub-frontier models.

Auditable by Design

Deterministic outputs are auditable outputs.

If a workflow requires auditability, it cannot rely on a raw stochastic call as its core mechanism. The architecture has to be designed around determinism from the beginning.

That makes this more than a cost discipline. It is also a compliance, reliability, and governance discipline. The more deterministic code you extract, the more of the workflow becomes inspectable by default.

Convergence: the same architectural choices that make AI cheaper also make it easier to defend.

Proof in practice

These are not marginal gains. They change unit economics.

The same pattern keeps showing up across technical domains: remove waste, move deterministic work back into software, and stop paying frontier-model prices for jobs that do not need frontier-model architecture.

greenchemistry.ai

Scientific computing

Before

$5.00 / run

After

< $0.01 / run

Cost per process analysis fell by more than 99% once most of the workflow was rebuilt as deterministic software and only the true reasoning edge cases stayed on model.

CrowdTamers

Marketing and content operations

Before

8-day turnaround

After

2-day turnaround

Workflow redesign cut overhead 60% in four months, supported 30% topline growth, and removed expensive inference from repeatable production work.

AI detection platform

High-volume text analysis

Before

$0.50 / million words

After

< $0.002 / million words

Model right-sizing and workload redesign cut cost by orders of magnitude while improving false positive performance and speeding up throughput.

VC fund

Founder screening and operations

Before

2 weeks

After

1 day

First-pass founder review dropped from two weeks to one day, while operational overhead fell 45% in two months.

Public framework•Low ceremony

Check out the open source materials.

Find it on GitHub