Ridgeline’s Product & Engineering team has spent a significant amount of time thinking about application programming interfaces (APIs) and how we can leverage these toolsets to build the best products imaginable. This article briefly introduces the concepts that contribute to the design and implementation of Ridgeline’s APIs; I plan to deliver future articles that dive deeper into each of the topics below.
The Importance of APIs
If you search for what it takes to be a successful SaaS company, countless sources explain the necessity of building a differentiated, performant, and cohesive product offering. When you dig deeper, you find that a good product is no longer all it takes to succeed online. The most successful companies have also embraced a first-class API and practice API-first approaches when designing their products.
Time and again we’ve seen companies with first-class APIs react to the needs of the users and create exceptional and cohesive products. The top SaaS companies invest heavily in their APIs and have proven that ease of interaction with a product is just as important as performance and reliability.
As discussed in Building a Modern Architecture on the Public Cloud, Ridgeline builds GraphQL APIs on top of AWS’s Lambda and API Gateway within a virtual private cloud (VPC). These components of the serverless stack provide flexibility and control, and allow for guardrails to iterate quickly and respond to the needs of our engineers and customers.
We adopted GraphQL early on in our API design and built our graph intentionally piece by piece, product by product. The graph we’ve created allows us to explore our data and fine tune the right set of questions and answers for delivering the best product experience in the financial space.
GraphQL has provided our engineers with a rich set of design principles around which we’ve been able to shape interactions with our data. Unlike traditional implementations, we think about APIs that provide richer responses and that avoid having to call a series of individual APIs to stitch together a cohesive answer. This provides a consistent feel across all of our products, as they are all powered by our single graph.
For our engineers, designing an API in one service provides all the skills and tools to design an API within any of our products and services. This enables teams to swarm and focus on the product rather than on the technology as new challenges arise.
Building a cohesive API on top of GraphQL using these serverless technologies required that we take extra measures to ensure the security, performance, and reliability of our graph. This exposed a fair number of technical challenges. However, the ability to design, expand, and iterate on hundreds of APIs without considerations to the underlying hardware and infrastructure has more than made up for these challenges.
Each Ridgeline product is made up of numerous serverless components we group together and label as microservices. Each microservice exposes a graph with a standard set of APIs that describe the creation and retrieval of data. Because each microservice at Ridgeline is a unique addition to the graph, we can rely on GraphQL Federation to connect each graph together and answer questions across multiple microservices with one API.
You typically find articles on API Gateway design closely tied to Rest and Swagger/OpenAPI implementations, while GraphQL implementations rely on Amplify and AppSync. After evaluating all of the proposed solutions and architectures, we decided to lean into the non-standard API gateway usage supporting GraphQL through API Gateway.
Accordingly, we handle API requests by exposing a single Public API Gateway and route to a Proxy Integration that meshes together multiple Private API gateways all within a secure VPC. This not only minimizes the attack surface for our GraphQL API but provides significantly more flexibility as we learn what formats and shapes of traffic we need to support. As we learn more, we will be able to adapt the network and the graph and provide the best experience.
Should We Expose GraphQL?
We had to make a tough and controversial decision early on in our API design philosophy: We chose not to expose GraphQL to the outside world. This was a hard sell for all of us, and I suspect it may come as a shock to some of you reading this. We know the power of GraphQL, we know how to build with it, we know how to shape it and make it performant, and yet we couldn’t overcome the cost of having to teach everyone who wants to implement Ridgeline how to use GraphQL. This was too radical a shift. Rather, we want to provide a simple way to interact with Ridgeline, and we want to do it in a way that preserves the benefits of GraphQL but with the simplicity of HTTP and a bit of REST best practices.
We were inspired to solve the problem of an exposed graph by learning from the success and failures of various persisted query projects, including those from Facebook, Apollo, and various texts on the matter. In the end, we were able to agree that large federated graphs can be incredibly complex, even to those that work with them everyday. Documentation around how to properly consume the graph becomes unwieldy to write and to maintain. Sending giant strings that connect multiple micro services together can be error prone and does not provide integrators with the support they need to succeed. And finally when we add federation and multiple operation support into the mix, we would have had to prove functionality of a seemingly infinite number of result shapes.
We’ve successfully solved this by decoupling our public-facing API from the inner workings of GraphQL. We started by identifying the shapes of each query or mutation that proved to be stable and production-ready and those that would provide the most value for our users (which includes our own developers). We then reduced the complexity of using these shapes from large text blocks to a single named operation. This has the added benefit of abstracting away the ability to customize the shape for each call, which is the first major divergence from raw GraphQL. With this technique, we were able to provide guarantees around each operation and guard against breakage within our API, even when we needed to use complex GraphQL shapes.
Documentation is at the heart of every API, and having persisted operations allows for a richer, more robust documentation strategy. Similar to most API documentation, Ridgeline’s documentation provides a guide about which API to use, how to use it, and what to expect as a response. We take documentation one step further by coupling our persisted queries directly to the generation of our documentation. This means that each and every persisted operation generates its own documentation in the same format as all others. Further, a robust series of validations can be run at each step of the development process to ensure that any documented APIs work as intended.
With persisted queries and automatically generated documentation, testing our products and microservices at the API level is a solvable problem. We use Intuit’s Karate as a way to provide API-level validation pre- and post-merge and to standardize product requirements and QA acceptance criteria. Our system guarantees that a modification to a graph will federate cleanly with microservices and is providing validation against the graph pre-merge, ensuring our velocity is maintained and our rework is minimized.
Ridgeline strives to continuously push boundaries within the API space while making sure we build a usable, relevant, and best-in-class enterprise API for our users. As mentioned, we will dig deeper into each of these topics in the coming months. In the meantime, we welcome feedback and commentary on how your own work and decisions have intersected with our initial findings and decisions. If this work and these problems appeal to you, we look forward to hearing from you and to working together to solve these fantastic challenges.