← Back to blog
9 min read

n8n vs Make vs Vapi: the real AI automation stack

A working comparison of n8n, Make, and Vapi for AI automation builders. What each tool actually does, where each one breaks in production, and which to pick per job.

People love asking what my stack is, and when I tell them, half the time they are disappointed. They wanted a secret. A tool nobody has heard of, a clever combination that explains why my agents ship and theirs don't. What I actually use is n8n, Make, Vapi, OpenAI, Claude, a Postgres database, and a Google Sheet I am embarrassed about. That is the whole list. It is also not the point.

The point is that the stack is the boring part of this job. People argue about n8n vs Make vs Zapier the way junior developers argue about Vim vs VS Code, it feels like the question matters because the tools are the thing you can hold in your hand, but the actual quality of your work lives somewhere else. I've built the same agent in all three. It worked in all three. It broke in all three. The difference between the builds was about six hours of reshuffling nodes; the difference between whether the client paid me was about whether I understood what the agent was supposed to do on a bad day.

Still, people want the answer, and the answer is useful if I give it honestly. So here is the honest version. What each tool actually does, where each one breaks, and which breakage hurts the most.

§ 01

The three jobs of an automation

Before the tools, the frame. Every AI automation you are ever going to build does one of three jobs, and the tool you pick is downstream of which job you are doing.

  1. Route. Something happens, a form is filled, a call ends, a row is updated, and a decision needs to be made about what happens next. No long reasoning, no generation, just: if A then B, if C then D, sometimes check with an LLM for the tiebreak. Routing is the job most people underestimate because it is unglamorous, and it is also where the vast majority of real client value lives. A plumber who needs his incoming voicemails triaged into "emergency," "schedule," and "spam" does not need an agent. He needs a router. Do not sell him an agent.
  2. Generate. Take some context and produce a piece of output a human would otherwise write. An email reply, a CRM note, a summary of a sales call, a draft of a quote. Generation is where LLMs earn their keep, and also where the failure mode is quietest, a bad generation usually looks fine until someone reads it carefully.
  3. Act. Take the context, decide what to do, and go do it in a real system on the customer's behalf. Book the meeting. Send the Slack message. Update the CRM. Acting is where demos look the best and production looks the worst, because acting is the job where the last ten percent of engineering actually matters and most of the work is in tool calls and retries and state.

Most of the "AI agent" content you see online is about job three. Most of the money I make is job one and job two, because job one and job two ship in a week and job three ships in six, and a client who gets something working in a week is a client who renews.

§ 02

n8n is for the job you are going to maintain

n8n is the tool I reach for when I know the client is going to live with this thing for a year. It is self-hosted if you want, it has real version control, you can write actual JavaScript in the Code node when the visual nodes run out, and the community nodes cover almost every API you will ever need.¹ Where n8n breaks is exactly where you would expect: the flakiness of long-running workflows.

Specifically: a workflow that takes twenty-five minutes to run, because it is waiting on an LLM that is waiting on a tool call that is waiting on a CRM API, will sometimes die in the middle, and the error will be some unhelpful thing about a worker timeout that does not tell you at which step the workflow failed. I have lost real time to this, a workflow that was mostly reliable in testing and mysterious in production. The fix was to split the workflow into three smaller workflows that called each other, each with its own retry and its own checkpoint in Postgres. This is not an n8n problem exactly; it is a distributed-systems problem that n8n's happy path hides from you until it doesn't.

n8n is also the tool most likely to teach you that the visual layer is a lie. You will spend two hours dragging boxes around to do something you could have done in eight lines of JavaScript inside a single Code node, and once you accept that the Code node is where the real work happens, n8n becomes about three times faster. The visual nodes are for the parts that are genuinely trivial. The Code node is for everything else.

§ 03

Make is for the job you are going to hand off

Make (formerly Integromat) is prettier, friendlier, and more forgiving. It is also the tool I reach for when I know the person maintaining the automation after me is not going to be a developer. The visual layer in Make is actually usable as a specification, a non-technical operator can open it, trace what happened on a given run, and make a small change without breaking everything.

Where Make breaks is that the pretty visual layer does not scale. At about twenty modules in a single scenario, the canvas turns into spaghetti and you are going to spend more time finding the module you want to edit than editing it. And Make's error handling is, charitably, limited. You can set up retries and fallback routes, but the moment you need anything more complex, exponential backoff, dead-letter queues, circuit breakers, you are fighting the platform. At that point you need to either decompose the scenario into smaller scenarios or move to n8n.²

Pricing on Make adds up faster than you expect, because the billing unit is "operations" and every module call is an operation. If your scenario does multiple API calls per trigger and the trigger fires frequently, you can blow through your monthly operation limit in under three weeks without noticing. That is not Make's fault; it is not doing the arithmetic upfront. Do the arithmetic.

§ 04

Vapi is for voice, and voice is a different animal

Vapi is the tool I reach for when the job is a voice agent on a phone line, inbound, outbound, or both. It wraps the three things you need for voice (a speech-to-text, an LLM, and a text-to-speech) into one runtime, lets you define tools the LLM can call, and gives you a webhook for server-side logic. On paper, it sounds like n8n for phones. In practice, it is a different category of engineering altogether, because the constraints of voice are unforgiving in ways that text automation is not.

The big one is latency. A voice agent has about eight hundred milliseconds of silence before a human starts to feel like they are talking to a robot, and every tool call you make eats into that budget. If you are pulling an appointment list from a CRM while the caller is waiting, and the CRM takes two seconds, you have already lost. The fix is pre-fetching: pull the data you are going to need before the caller asks, and hold it in the context. This is counterintuitive if you came from text automation, where pulling-on-demand is the default pattern. In voice, pulling-on-demand is a user-experience bug.

The second place Vapi breaks is interruption handling. Real humans interrupt each other constantly. Vapi has gotten much better at this in the last year, but the model still occasionally keeps talking over the caller, or stops mid-sentence and never picks back up, or gets confused when the caller says "no, wait" in the middle of a sentence. These bugs are specific to the particular combination of STT, LLM, and TTS you are using, and the fix is usually a combination of prompt engineering and Vapi configuration that you can only figure out by listening to recordings of real calls. Which means: if you are selling a voice agent to a client, you are also committing to listening to recordings of their calls for at least the first week, and probably longer. I do not mean that as a complaint; I mean that if you do not do it, the agent is going to embarrass you.³

The third place Vapi breaks is around transfers. Handing a live call off to a human is a surprisingly hard problem, the caller is already emotional, the human on the other end needs context, and every second of dead air makes the handoff feel worse. I have a whole workflow for this that I will not fit in this post, but the summary is: warm transfers with a pre-recorded bridge message, and a CRM note written by the agent before the handoff, so the human who picks up is not starting from zero.

§ 05

Where I use what

Because I know someone is going to ask: here is how I actually pick, in practice.

Client wants a lead-qualification form that drafts a personalized reply? Make. Ten modules, done in a day, handed off to their ops person in an hour.

Client wants a nightly workflow that reads a thousand rows from a CRM, enriches them with an LLM, writes them back, and emails a summary? n8n. Too many rows for Make to price well, and the error-handling needs are real.

Client wants a 24/7 inbound receptionist that books appointments into an existing Google Calendar? Vapi, with an n8n workflow behind it for the actual calendar writes. I do not let Vapi write to Google Calendar directly, because when Vapi's tool-call times out in the middle of a write, you get a ghost booking that the caller thinks was confirmed and the calendar never saw. That bug has bitten me. I have since written it into my CLAUDE.md.

§ 06

The real answer

If you are reading this looking for the real answer to "what stack should I use," here it is. The stack is not the bottleneck. For any automation I have ever built, the tool mattered less than the observability, which mattered less than the prompt, which mattered less than whether I had understood the job in the first place. You can build a great agent in all of these tools. You can build a disaster in all of them too. The difference is not the tool. The difference is you.

So: pick one. Go deep. Learn where it breaks. Do not spend another month comparison-shopping. The month you spend picking the perfect tool is the month you could have shipped your first paying client, and the client does not care what's on your canvas.

¹ The community-node ecosystem is simultaneously n8n's greatest strength and its most common source of "why is this broken on Tuesday" bugs. A community node is a Node module maintained by one person who may or may not still be awake.

² I've had the same scenario run in both Make and n8n for a full week, just to see which one broke first. Make broke at the canvas-complexity ceiling. n8n broke at the long-running-workflow ceiling. Both broke. The honest answer is that the ceilings are real, and the work is picking which one is farther from where you are going.

³ If you are building a voice agent and you have not listened to ten hours of your own agent's calls, you do not actually know what it does. You have opinions about what it does. The difference between an opinion and a fact is a recording.