Proper REST API design is hard

Proper REST API design is hard

REST (Representational state transfer) is arguably the most popular architectural style for designing HTTP-based web APIs these days. While the basic idea is easy to understand, there’s a lot of corner cases that are not so straightforward. This has been on my mind lately while trying to revise the conventions for VocaDB’s REST API. It turns out that finding answers to some of these questions is difficult, and finding concrete examples of “proper” REST APIs that follow the conventions as precisely as possible is even more difficult. I decided to write something about the issues I’ve faced.

REST?

The main thing about REST is that URLs are used to locate resources (entities in your system), while HTTP verbs are used to determine what to do with them.

So instead of URLs with verbs like /api/create-user and /api/get-users you have /api/users, where GET call to that URL returns the list of users, and POST call creates a new user. Similarly, /api/users/123 indicates that the call concerns user with ID “123”. GET will never modify the resource, thus it’s considered “safe”. Additional HTTP verbs can also be used: DELETE /api/users/123 will delete that user, PUT /api/users/123 updates the user and PATCH /api/users/123 performs a partial update.

Another thing about REST is that it’s intended to be stateless. This means the result of the call should not depend on who’s calling the API (as long as the client has the necessary permissions), and what the user is currently doing on the website (user’s session).

POST or PUT?

One of the main sources of confusion is the difference between POST and PUT verbs. I’ve seen some explanations that POST means create while PUT means update, which is true for most systems, but it’s an oversimplification. In systems that allow the client to assign the ID, PUT can also be used to create resources.

In HTTP terms, POST means that the server will determine the final URL of the resource. This is why in most systems a new resource is created by POSTing to the collection, allowing the server to generate an ID for the new resource. POST is not guaranteed to be idempotent, meaning multiple POST calls to the same URL will have different effects (such as multiple resources being created). Meanwhile, the URL for a PUT operation must uniquely identify the resource being updated/created, and that the resource will always be completely replaced if it exists. This has the benefit that PUT calls can be repeated as many times a needed without unwanted side effects.

But it’s more complicated than that. Almost all systems have side effects to updates. If the system tracks modification date, that modification date is updated with every PUT request. Logging and auditing information is created. Sometimes emails are sent. Making the system truly idempotent can be very difficult, and often unnecessary. It depends on the system of course, but as long as the side effects are minor and acceptable, without duplicate resources being created, I would consider the interface idempotent.

So if you want to do it properly, POST /api/users will be used to create a new user and PUT /api/users/123 will either create a new user with ID 123, or replace that user if it exists. In this model, POST /api/users/123 does not make sense and is always an error. However, because the URLs are different, this raises the question whether PUT is even needed. You might as well use POST /api/users/123 to create/update the resource, which is actually what many are doing. There is nothing saying that POST cannot be idempotent, just that it’s not guaranteed to be idempotent, unlike PUT.

In my experience, the main difference between POST and PUT arises when modifying collections. Following the REST practices, POSTing to collection always appends to the collection, creating one or more new resources. PUT to collection replaces the whole collection, meaning items not present in the new set are removed. If you need these operations, this is probably when you need to start using PUT, but until then you can be fine with just POST.

URL or body?

GET requests never contain a body, all parameters are passed as query parameters in the URL, for example /api/users?name=John.

For POST and PUT requests, data can be passed in either the URL or request body. According to common REST conventions, the data should always be passed in the body (which can be formatted as form data, JSON, XML or even binary). For complex, hierarchical objects it doesn’t make sense to do it otherwise, but for simple values I’ve sometimes violated this principle by passing data in query parameters, because creating an object just for passing one or two primitives seems like unnecessary complication.

Stateless?

Another constraint that many REST APIs violate is being completely stateless. For example, the Twitter REST API is very much stateful because it includes methods that depend on the authenticated user. Doing it properly, the URL would always include the ID of the authenticated user, so that two users used different URLs for retrieving their statuses.

Personally I’ve used 3 different styles for addressing the logged in user: 1) /api/users/{id}/resource, 2) /api/users/current/resource and 3) /api/resources. Only the first style is stateless, and while it’s definitely the most proper way of doing it, for systems where impersonation is not needed, I’ve found the last two styles much more convenient to use. Passing the user ID for all requests is simply unnecessary complication when the server has to authorize the user anyway. The second style is a compromise, but still a bit verbose.

Do note however, that the main benefit of the first style is that GET requests can be cached (privately, in the client or the server, but not in the proxy). So for APIs that can be cached, or where the authorization is somehow separated from processing the request, I would prefer the first style.

I would make a clear distinction between the user authentication and the so called session state. Authentication is generally passed as a cookie for each request and it can’t really be avoided. Session state, where the server stores some information about the current user in memory is generally a bad idea anyway because it messes up browsing with multiple tabs and it doesn’t scale to multiple server instances. Other than identifying the user, it’s best to make the API as stateless as possible.

About child resources

Child resources, particularly link objects, are the thing that I’ve been having the most trouble with.

Let’s say your domain model contains users and companies. Users can belong to many companies, so it’s a many-to-many relationship. Now you have multiple ways of addressing the relationship:

  • /api/users/{id}/companies
  • /api/companies/{id}/users
  • /api/company-users (for managing all the link objects)

Should you provide all 3 ways for accessing the links?

Going further, let’s say we can provide additional information about the relationship, for example user’s role in the company. Most likely the link object has some unique ID in the database, but considering there can be only one link object per user/company pair, managing the link through that ID is unnecessarily complicated. Instead I’d prefer managing the link through the combination of user’s ID and company ID. Thus I’ve preferred the style /api/companies/{id}/users/{userId} for managing the link. This is what is suggested in a StackOverflow post as well, but I realize this can easily be confused to make it look like users are child objects for the company, even though the case is only about managing the relationship.

Comments are closed.