The Quiet Death of REST

Two summers ago a friend of mine. A backend lead at a mid-sized logistics outfit in the Quad Cities, called on a Sunday afternoon to say their dispatch service had been down for three hours. He sounded tired in the way people sound when they have already decided to fix the larger problem. Are only working through the local one.

The proximate cause was a JSON payload that had grown. Over eighteen months of feature requests, from about four kilobytes to just over a megabyte per call. The dispatch service made that call once per truck. Every fifteen seconds, against an internal REST endpoint that returned the full driver-and-load object whether you wanted it or not.

They had nine hundred trucks. The math was already bad before a marketing team somewhere added a new field for preferred radio stations. He fixed the immediate outage by reverting a deploy. Then he spent the next six months replacing the internal HTTP-and-JSON traffic with gRPC.

Putting a GraphQL gateway in front of the mobile and web clients, and keeping exactly one REST surface alive: the public partner API documented on their developer portal. When I asked him why he had kept any REST at all. He said, “Because that is the part our customers integrate against, and that is the part the press release is about.” Then he laughed once, the way he does, and hung up. I have spent the better part of two years asking the same question of engineers at maybe thirty different shops.

Most of them between fifty and five hundred people, a few of them larger. The pattern is so consistent now that I have stopped treating it as an opinion. gRPC inside the wall. GraphQL at the edge where humans and their browsers actually live.

REST kept around for the partner contract. The developer relations page, the conference talk, the third-party integrator who wants to curl a thing and see JSON come back. The thing is, almost nobody puts it in those terms publicly. They will write a blog post about a single migration.

Or a talk about why they chose gRPC for a specific service, but the larger picture, the three-layer hybrid, is something they describe in private and call obvious. So I want to write down what is actually happening. With numbers where I have them, and with the specific failures that pushed people in this direction. Not because REST is dead in any literal sense.

It is not, but because the role it plays has narrowed so much that calling a system “RESTful” now describes the marketing surface, not the architecture. ## The failure mode nobody wanted to talk about

The Quad Cities story is not unusual. What is unusual is that my friend was willing to put numbers on it. Their dispatch payload at the time of the outage was 1.04 MB on the wire, gzipped down to about 180 KB. The service polled every fifteen seconds per truck.

Nine hundred trucks means sixty calls per second sustained. Which means roughly 11 MB per second of compressed traffic, or about 60 MB per second uncompressed once it hit application memory and got deserialized into Java objects. That is not a number that kills a system on its own. What killed it was the GC pressure from deserializing nine hundred large JSON documents per fifteen-second window across a Jackson pipeline that was not configured for streaming.

Old generation heap filled up. Full GCs started running for two to four seconds at a time, the load balancer started failing health checks, and the cluster began rolling restarts that made everything worse. After the migration to gRPC with protobuf. The same logical call was 31 KB on the wire, with no gzip step, deserialized into pre-allocated message objects with predictable memory layout.

The polling pattern was replaced. Almost as an afterthought, by a server-streaming RPC that pushed updates only when the underlying load actually changed. Average traffic dropped by something like a factor of fifty. CPU on the dispatch tier dropped by sixty percent.

They moved from a sixteen-node cluster to a six-node cluster and the six-node cluster ran cooler than the sixteen-node one had. He told me all this at a coffee shop on Linn Street in March. And what I keep coming back to is something he said near the end. “We did not migrate because gRPC was fashionable.

We migrated because JSON over HTTP/1.1 was lying to us about what our system was doing.” The lie. As he described it, was that everything looked fine in the dashboards. Latencies were acceptable. Error rates were low.

The system was, by every conventional measure, healthy. And then it fell over because the actual cost of the protocol was hidden inside the runtime. In places nobody had instrumented. I have heard variants of this story now from a team in Des Moines running insurance claims processing.

From a small SaaS company in Madison that does scheduling for medical practices, from a payments outfit in Kansas City, from a logistics company in Memphis that I am told not to name. The specifics vary. The shape does not. JSON over HTTP/1.1 between internal services scales linearly with payload size and call frequency in ways that are not obvious until you hit the wall.

And then the wall is sudden. ## What gRPC actually does differently. And where the numbers come from

I want to be careful here because there is a kind of writing about protocols that is little more than folklore, and I have read too much of it. So let me stick to what I have measured myself, or what teams have shown me on their own dashboards. gRPC runs on HTTP/2, which gives you multiplexing, header compression, and persistent connections by default. Protobuf.

The serialization format, encodes fields by tag number instead of by name, uses variable-length integer encoding, and skips fields that are not set. The combined effect. For a payload that has a lot of structure and a lot of optional fields, is usually a five-to-ten-times reduction in wire size against the equivalent JSON, and a much larger reduction in CPU spent on serialization and deserialization. I ran my own small benchmark on this in January.

Mostly to make sure I was not repeating numbers I could not defend. I took a payload representing a delivery manifest. With about forty fields, six of them nested objects, three of them arrays of between two and twenty items. JSON encoding produced a 4.2 KB document.

Protobuf produced 890 bytes. Encoding the JSON in Go with the standard encoding/json package took 38 microseconds on my laptop. Encoding the protobuf took 4 microseconds. Decoding showed similar ratios.

These numbers are not surprising to anyone who has done this kind of measurement. And they are not the most important part of the story. The most important part is that the ratio holds. Roughly, across implementations and languages, and it compounds when you have services calling services calling services.

Consider a request that traverses six internal hops, which is not at all unusual in a moderately decomposed system. If each hop serializes and deserializes a 4 KB JSON document and spends forty microseconds doing it. You have spent nearly half a millisecond on serialization overhead alone, plus the wire time, plus the connection setup if you are not pooling well. The same six-hop request in gRPC spends maybe fifty microseconds total on serialization.

Runs over already-open HTTP/2 streams, and finishes before the JSON version has gotten through its third hop. The shop in Memphis. The one I am not supposed to name, showed me a flame graph of their order-fulfillment path before and after their migration. Before: a recognizable spine of JSON parsing dominating every service-to-service boundary.

With the actual business logic appearing as thin slivers between thick bands of Jackson method calls. After: the business logic was the spine, and the protobuf decoding was barely visible. Their p99 latency on the fulfillment path dropped from 340 ms to 90 ms. Their infrastructure bill, on the services involved, dropped by something like forty percent.

The migration took two engineers about four months. There is also the matter of contracts. Protobuf schemas are, for all their syntactic quirks, real schemas. You generate code from them.

The code you generate is the same code on the client and the server, modulo language. If a field is added, old clients ignore it. If a field is removed. You mark it reserved, and the protobuf compiler will refuse to let you reuse the tag number.

This sounds like a small thing if you have not lived through the alternative. If you have lived through the alternative. Which is to say if you have ever debugged a JSON API where the server started returning a field as a string instead of a number and three different clients broke in three different ways on three different release cycles, then it is not a small thing. It is most of the reason internal teams adopt gRPC even when their payloads are small enough that the wire savings would not justify it on their own.

The part that catches me, having watched this play out across so many shops, is how unromantic the decision usually is. Nobody is excited about protobuf. They are not posting about it on the conference circuit. They are using it because it stops a particular kind of pain.

The pain of not knowing what shape your data is going to be when it arrives at the next service, and the pain of paying CPU and memory costs for a serialization format that was designed for humans to read in 2005. ## The case against gRPC at the edge

If gRPC is so good. The obvious question is why no serious shop I know exposes it to browsers, mobile apps, or third-party partners. The answer is that the cost-benefit shifts completely once you cross the edge. Browsers do not speak gRPC natively.

There is gRPC-Web. Which is a translation layer, and it works, but it requires a proxy, and it loses some of the streaming semantics, and the tooling is uneven. Mobile clients can use gRPC directly. And some teams do, but the build pipeline adds a step and the protobuf-generated code adds binary size, and most product teams would rather spend that complexity budget on something the user can actually see.

Third-party partners. The people who integrate against your public API, are not going to install protoc and figure out your build system in order to call your endpoint. They want to curl something and get JSON back. They want to copy a code snippet from your documentation into their language of choice and have it work.

They want to be able to inspect responses in their browser’s developer tools. The whole apparatus of REST plus JSON. For all its inefficiencies inside a data center, is almost perfectly designed for the externally-facing developer experience. It is human-readable.

It is debuggable with tools that everyone already has, it is teachable in a thirty-minute screen share, and the conventions, while loose, are widely understood. So the question at the edge is not really gRPC versus REST. It is what you give your own product engineers. Your web team, your mobile team, when they need to talk to your backend.

And here REST has been losing ground for a different reason, which is GraphQL. ## Why GraphQL won the edge, slowly and then suddenly

I was skeptical of GraphQL for a long time. Some of this is temperamental. I do not trust technologies that arrive with too much enthusiasm attached to them, and around 2017 there was an enthusiasm problem. But the pattern I have been watching for the last three or four years is not about enthusiasm.

It is about a very specific set of problems that REST handles poorly at the edge. And that GraphQL handles, if not gracefully, then at least adequately. The core problem is this. A modern product surface.

Whether it is a web app or a mobile app, needs data from many places to render a single screen. A user profile page might need the user’s basic info. Their recent activity, their notification settings, their friend list, their permissions, and a feature flag check, all to render the initial view. In REST.

This is six round trips, or it is one custom endpoint that you build specifically for this page, or it is some combination of the two with caching in front. If you go the custom-endpoint route. And most teams eventually do, you end up with a proliferation of endpoints named things like /api/v2/user-profile-page-data, which is REST in the sense that it uses HTTP verbs and returns JSON, but is no longer RESTful in any architectural sense. It is RPC with a URL scheme.

GraphQL formalizes what those teams were already doing. You define a schema of types and their relationships. The client asks for exactly the fields it needs, structured the way it wants them. The server resolves each field, often by calling out to the underlying services that own that data.

The client gets back one response, shaped to its request, and the round-trip count drops to one. The scheduling company in Madison I mentioned earlier showed me their numbers on this. Their mobile app’s home screen. Before GraphQL, made fourteen REST calls on cold load, with a total wire time of about 2.1 seconds on a typical LTE connection.

After GraphQL, it made one call, with a wire time of about 380 ms. The payload was slightly larger than any single one of the previous fourteen calls but smaller than the sum. Because the GraphQL response only included the fields the home screen actually needed, and the underlying REST endpoints had been returning their full standard payloads regardless. There is a real cost to GraphQL, and I want to be honest about it.

The server has to do more work. Each field resolver might trigger its own database query or service call. And if you are not careful you end up with the N+1 problem at industrial scale, where a single client query fans out into hundreds of internal calls. Solving this requires dataloaders, query batching, careful caching, and a kind of operational discipline that small teams do not always have.

I have seen shops adopt GraphQL and regret it because they did not have the engineering bench to operate it well. I have also seen shops adopt GraphQL and never look back. Because their problem was overwhelmingly an edge-shaping problem and GraphQL solved it. The places where GraphQL is winning are the places where the edge is complicated.

Consumer products with many client surfaces. B2B SaaS with rich dashboards, anything where the client team and the backend team are organizationally separate and the client team is tired of asking for new endpoints. In those settings. GraphQL functions as a contract between the client and a stable backend, mediated by a gateway team that owns the schema.

The gateway team is often the most strategically important team in the company that nobody outside the engineering org has heard of. The hybrid pattern I am describing, then, looks like this. Internal services speak gRPC to each other. A GraphQL gateway sits at the edge, translating client queries into fan-out calls against those internal services.

The gateway is, in practice, the only thing in the system that has to know how to do both. And then. Parallel to the GraphQL gateway, there is a REST surface, often much smaller, that exposes a defined set of resources for partners and for public consumption. The REST surface is sometimes built on top of the same gRPC services.

Sometimes it is built on top of the GraphQL gateway itself. Using a layer that translates REST calls into pre-defined GraphQL queries. Either way, it is a thin shell, and it is maintained as a product, not as the architecture. ## REST as a product surface.

Not an architecture

When I started reporting on this, I expected to find that REST was being abandoned outright. What I found instead was that REST had been promoted to a different role. And that the people maintaining it had a clearer sense of why it existed than the people who had built it ten years ago. The payments company in Kansas City is a useful example.

Their public REST API is documented to a level of polish that I have rarely seen. With examples in seven languages, a sandbox environment, a CLI tool, and a changelog that goes back four years. Internally, almost nothing they do is REST anymore. Their service mesh is gRPC.

Their web dashboard is GraphQL. The public REST API is a separate codebase, maintained by a small team that thinks of it as a product. They version it carefully, they deprecate fields slowly, and they treat any breaking change as a customer-facing event. I asked their engineering director why they kept the REST API at all, instead of just exposing GraphQL externally.

She gave me an answer I have heard, in different words, from almost everyone in this position. “Our partners are not engineers at large tech companies. They are at insurance brokerages, small banks, billing departments. They want a URL and a JSON response.

If we gave them GraphQL. Half of them would never integrate, and the other half would integrate badly and blame us when it broke.” Then she added, almost as an aside, “Also, GraphQL is hard to rate-limit, and we need to rate-limit.” This last point comes up more than you would expect. REST endpoints. With their predictable URL patterns and uniform request shapes, are easy to put behind a rate limiter, a WAF, a cache.

GraphQL queries can vary arbitrarily in cost, and bounding that cost at the gateway is a real engineering problem. So REST persists at the partner edge for reasons that have very little to do with its architectural virtues. Everything to do with its position in the wider field of tooling and convention. It is what people know.

It is what tools support. It is what auditors can read. It is what a sales engineer can demonstrate in a meeting without firing up a special client. To put it less politely.

REST is the format you use when you cannot assume anything about the person on the other end except that they know HTTP and they have heard of JSON. This is not nothing. This is. In fact, a substantial role, and it explains why REST is not going to disappear in any timeframe that matters.

But it is a much smaller role than REST played in 2015. When “we’re a RESTful API company” was a real architectural statement and not a marketing one. ## The gateway problem, which is the actual problem

The cleanest way to describe the hybrid pattern is to draw it. gRPC inside, GraphQL at the human edge, REST at the partner edge. But the cleanness of the picture obscures where the actual difficulty lives, which is at the seams. The gateway.

The thing that translates from one protocol to another, is where every serious engineering team I have talked to has spent disproportionate time. It is the part nobody wants to write and everybody ends up writing. There are off-the-shelf options. Envoy can do gRPC-to-JSON transcoding.

Apollo Server and similar tools can sit on top of gRPC backends. AWS and Google offer managed gateway products. None of them. In my reporting, has fully solved the problem for any team I have spoken to, because the problem is not really a translation problem.

It is a contract problem. Here is what tends to happen. A team adopts gRPC internally and defines its protobuf schemas. The protobuf schemas reflect the way the backend engineers think about their domain.

Then a gateway is set up to expose those services to the edge. And the edge team, the people building the web and mobile clients, immediately discovers that the protobuf shape is wrong for their purposes. The backend has a User message with thirty fields. The mobile app needs eight of them.

Plus three computed fields that do not exist in the User message, plus a relationship to a Notification message that requires a separate call. The gateway, in order to be useful, has to do real work: aggregating, transforming, denormalizing, computing. If the gateway is GraphQL. This work lives in the resolvers, and the resolvers are written and owned by, depending on the team, either the backend team, the edge team, or a third gateway team.

The choice of who owns the resolvers turns out to be one of the most consequential organizational decisions a company makes when it adopts this pattern. And I have seen it go three different ways with three different results. When the backend team owns the resolvers. The resolvers stay close to the data, but the edge team has to file tickets to get any new shape they need, and the GraphQL schema starts to look suspiciously like the protobuf schema with a different syntax.

The benefit of GraphQL, which is supposed to be edge flexibility, gets eroded. When the edge team owns the resolvers. The schema is shaped well for the clients, but the edge team has to learn the backend services in detail, and they end up duplicating logic, and they sometimes hit the underlying services in inefficient ways that the backend team did not anticipate. The N+1 problem flourishes.

When a separate gateway team owns the resolvers, you get the best technical outcome and the worst organizational one. The gateway team becomes a bottleneck, every feature requires their cooperation, and they become resented in proportion to their importance. The shops that have made this work. And there are some, have done so by treating the gateway team as a platform team with clear SLAs and by investing in tooling that lets product teams contribute resolvers themselves under review.

The Memphis logistics shop went through all three phases over about two years. They are currently in the third phase. With a gateway team of four engineers, and they say it is working, but the engineering director said something to me that I have been thinking about since. “The gateway is the most important part of our architecture and the most boring part of our recruiting pitch.

We cannot get anyone excited about working on it. Everyone wants to work on the ML stuff or the new mobile app. The gateway is what makes any of that possible. And it is a thankless job.” I do not have a clean response to that.

It seems like a real problem. Not just for that shop but for the industry, and I notice that nobody is writing the conference talks that would make gateway work feel important. ## Operational details that matter more than the protocol choice

I want to spend some time on the parts of this that do not make it into the architecture diagrams. Because in my experience these are the parts that determine whether a hybrid actually works. Observability gets harder. With REST.

Every request has a URL, a method, a status code, a latency, and you can put all of these in a dashboard and basically know what is happening. With gRPC, you get similar primitives but they come through different channels and your tooling has to support them. With GraphQL. The URL is always /graphql and the method is always POST and the status code is almost always 200 even when something has gone wrong, because GraphQL errors are returned in the response body.

If you do not invest in GraphQL-aware tracing from the start, you end up blind. The shops that get this right instrument their resolvers individually. Tag every query with an operation name, and treat each named operation as a separate logical endpoint for the purposes of monitoring and alerting. Versioning is different in each layer.

REST versioning is usually URL-based or header-based, and breaking changes happen on version bumps. gRPC versioning is field-by-field, with the protobuf rules about reserved fields and additive changes. GraphQL versioning is supposed to be unnecessary. In the sense that you deprecate fields rather than version the schema, but in practice this requires real discipline about deprecation cycles and client visibility into which fields are still in use.

The shops I have seen succeed at GraphQL have automated tooling that tells them which clients are still using which deprecated fields. And they have processes that follow up with those clients before fields are removed. Authentication and authorization are easier in some ways and harder in others. REST auth is usually a bearer token in a header.

And the server inspects the URL and method to decide what is allowed. gRPC auth is similar, but the per-method authorization is more granular by default, because every method has a specific signature. GraphQL auth is the hardest. Because a single query can touch many resources, and authorization has to happen at the field level, often deep in the resolver tree.

The pattern that has taken hold across most shops I visited is to push authorization down to the underlying services. Which are usually gRPC, and let the GraphQL gateway propagate the user context. The gateway does not enforce auth itself, except in coarse ways. The actual decisions are made closer to the data.

Caching is where the hybrid model shows its biggest weakness. REST plays well with HTTP caches at every layer: browsers, CDNs, reverse proxies. GraphQL does not, because every request is a POST to the same URL with a different body. There are workarounds.

Persisted queries, query-hash-based caching, automatic persisted queries from clients like Apollo, but none of them are as universally supported as HTTP caching is. Shops that need aggressive edge caching either expose REST endpoints alongside their GraphQL gateway. Or they invest heavily in persisted-query infrastructure, or they accept that their cache hit rate at the edge will be lower than it was in the REST era. Most of them.

In my experience, accept it, because the latency benefits of GraphQL at the application level outweigh the cache-miss costs for the kinds of queries product teams write. Errors are subtler than you would think. REST has status codes, which are blunt but universally understood. gRPC has its own status codes.

Which are similar in spirit but not identical, and which require some translation when you cross into other protocol territories. GraphQL returns errors in the response body, with the data field populated for any successful parts of the query. This last point is important. A GraphQL response can be partially successful, with some fields filled in and some fields returning errors.

Clients have to be written to handle this. And many client developers, especially those coming from REST, are not used to it. I have heard several stories of GraphQL adoptions where the client teams initially treated any errors field in the response as a total failure. And only later realized they could render the parts that succeeded.

The economics. Which are the real reason

I have been writing as if this is a story about protocols, but it is really a story about money, and I do not want to leave that implicit. The Memphis shop dropped forty percent of their infrastructure bill on the migration paths I described. The Madison scheduling company dropped about thirty percent. The Quad Cities logistics company did not give me a percentage but said the savings paid back the migration in about nine months.

The Kansas City payments company is larger and the numbers are different in absolute terms. But the engineering director told me the gRPC migration of their internal services was, dollar for dollar, the highest-ROI engineering project they had done in five years. These numbers are not because gRPC is magically efficient in some abstract sense. They are because JSON over HTTP/1.1.

Deployed at scale across many services, wastes an enormous amount of CPU on serialization and an enormous amount of memory on parsing and an enormous amount of network on uncompressed text and an enormous amount of latency on connection management. The savings show up as smaller fleets. Lower bills, and engineers who spend less time fighting GC pauses and more time shipping features. There is also a hiring story, which I have heard from a few people but do not want to overstate.

Teams that operate gRPC and GraphQL well are easier to staff for senior roles than teams that are still on standard REST stacks. Because the more sophisticated protocols read as a signal of engineering maturity. I am not sure I believe this fully. The signaling effects in tech hiring are complicated, but I have heard it enough that I am noting it.

And there is the speed-of-development story, which I do believe. The shops that have adopted the hybrid pattern report that their product engineers move faster. Because the GraphQL gateway insulates them from the underlying service topology, and their backend engineers move faster, because the gRPC contracts give them clear interfaces and they are not constantly negotiating JSON shapes with consumers. The friction that REST introduced at the boundaries.

The version negotiations, the breaking-change debates, the “should this field be in the user object or a separate call” arguments, mostly goes away. It is replaced by different friction. The gateway-team-is-a-bottleneck friction I described earlier, but the new friction seems to be more manageable than the old. ## What this means if you are not at a serious shop

I want to close by saying who this does not apply to. Because I have been talking about midsized and larger engineering organizations, and most software is built by smaller teams. If you are a team of three building a SaaS product, you do not need any of this. A monolith with a REST API is fine.

You will move faster, ship more, and have less to operate. The hybrid pattern shows up when you have enough internal service traffic that the inefficiencies of JSON-over-HTTP start to cost real money. And when you have enough edge surface complexity that REST starts to require custom per-page endpoints. Below that threshold, the hybrid is overhead.

The threshold. As best I can tell from my reporting, is somewhere between fifteen and thirty engineers, and it correlates with the point at which you have more than five internal services and more than two client surfaces. Below that, you have one service and one or two clients, and REST is the right answer. At or above that.

The math starts to favor the hybrid, and the shops that resist it tend to be the ones I get called by on Sunday afternoons. There is a middle ground that I am seeing more of. Which is to skip the gRPC step and just adopt GraphQL at the edge while keeping REST internally. This works for a while, and for some teams it works indefinitely, because their internal service traffic is not the bottleneck.

But the shops I know that have gone this route and then continued to grow have. Almost without exception, eventually added gRPC internally too, because the internal JSON traffic became the bottleneck once the edge problem was solved. The pattern, then, is not a single jump. It is a sequence.

Start with REST. Add GraphQL at the edge when the edge gets complicated. Add gRPC internally when the internal traffic gets expensive. Keep REST as a maintained product surface for partners.

Be honest about what each layer is for. Do not pretend you are RESTful when you are not. And do not let anyone outside the engineering team tell you what the architecture should be based on a blog post they read. What I keep coming back to, after two years of these conversations, is how quietly all of this has happened.

There is no manifesto. There is no replacement standard with a catchy name and a logo. There is just a slow movement of serious teams toward an arrangement that works. Made up of components that have existed for years, combined in a way that nobody markets and everybody, eventually, ends up at.

REST is not dead. It has been given a smaller, more specific job. The interesting work is happening behind it. Where you cannot see it, and where the people doing it are too busy to write about it.

That, more than anything else, is why I wrote this down.