Users, Tokens and Event Logging

by Richard Taylor : 2020-02-18

The reason that I want to tackle logging now, while there is not much code, is that it can take an awful lot of refactoring to change something that sits in every single API method. So I put in this requesting_user parameter everywhere as a reminder... and now it's time to sort out what that means.

One thing that has become clear in the last few years is that the place for managing user data is not in every single service. If this concept is new to you then there is a good Introduction to OpenID Connect which is very readable.

So am I going to implement an integration to an OAuth 2.0 provider before I do anything else? No. I just need to be aware of the concepts and put in the appropriate placeholders.

Tokens not Users

The workflow for a client using our API with authentication will be:

  1. Client logs into a third-party service to get a token.
  2. Client passes the token to the API as part of every request.
  3. API extracts the user id and permissions from the token.
  4. API either grants or refuses access to the resource requested.

The way that the client passes the token to the API is via a HTTP header:

Authorization: Bearer wKICJleHAiOiAxMzExMjgxOT

The token is Base64 encoded (and usually a bit longer than I have shown). At this stage it doesn't matter where the token comes from. Our test code can just make one up and pass it to the API.

def invent_a_jwt():
    return {'Authorization': 'Bearer abcd'}

class TestBlackBoxItemAPI(unittest.TestCase):

    def setUp(self):
        self.url = 'http://localhost:2207/v1/items'

    def test_post_succeeds_for_empty_json(self):

        response = requests.post(self.url,
                                 data = '{}',
                                 headers = invent_a_jwt())

        self.assertEqual(200, response.status_code)

And our HTTP Server can extract the token from the header and pass it to the API instead of the made-up User that we had before. Which means we change the API methods from:

def get_item(self, id, requesting_user)

def create_item(self, dictionary, requesting_user)

def update_item(self, id, item, requesting_user)

to

def get_item(self, id, token)

def create_item(self, dictionary, token)

def update_item(self, id, item, token)

But what do we do with the token? Given that our test code just made one up! Well, we delegate to an Authority object:

from thelca.auth import Authority

authority = Authority()

This object doesn't have to do any authorisation yet, it just has to have the same interface as the object that will later when we implement it properly.

Let's look at the updated get_item method:

def get_item(self, id, token):
    item = storage.find_item(id)
    user = authority.check_read_item(token, item)

    logging.item_read(item, user)
    return item

We find the item first, then check whether we have the authority to access it. That's because the check might depend on some property of the item. If the check fails then the authority object raises a NotAuthorisedError. However if it succeeds then the authority object returns some kind of identifier for the user which we can use to log the event before returning the item.

So we do not need any representation of a User in the API at all. The token represents the user and the authority object extracts the user identifier from the token for us.

For the time being all we need the authority object to do is:

And this does all of that for what we need in get_item:

def check_read_item(self, token, item):
    if token is None or token == 'BAAD':
        raise NotAuthorisedError()
    else:
        return "a-test-only-id-" + token

Now we have all the pieces we need to write the rest of the API without having to implement the actual integration to an OAuth 2.0 provider. We can do that integration later when we need it; without having to change any of the API code.

Event Logging

Since we have nailed down how to know who is accessing the API, we can think about how to log what they do.

You can see at this version of the project that I have added a logging.py module. Under the hood this uses the standard python 'logging' module to write lines to a file. That is not important right now; what matters is that our API passes all the relevant information to the logging.py module so that we can send it where it needs to go.

Take a look at the api.py module. You can see that every method logs the data it is about to return; and who it is returning it to:

def get_item(self, id, token):
    ...
    logging.item_read(item, user)
    return item

def create_item(self, dictionary, token):
    ...
    logging.item_created(item, user)
    return item

def update_item(self, id, item, token):
    ...
    logging.item_updated(item, user)
    return item

In each case the 'user' is the identifier extracted from the token by our authority object as described above. And the 'item' is the full Item object.

At this version I am writing out the full object for item_created and item_updated, since these operations change the Item object; and I am writing out the object identifier for item_read since it does not change the Item object.

This stream has enough information to reconstruct the data we have stored in memory for any point in time. It is probably too verbose, but I want to get it working first before worrying about efficiency.

Here is an example of what the thelca-events.log file contains after a number of requests have been processed. It is one JSON object per line but I have formatted it to make it more readable:

{"time": "2020-02-18T16:37:09.654949+00:00",
 "type": "Item created",
 "user": "a-test-only-id-abcd",
 "value": {"created_at": "2020-02-18T16:37:09.654917+00:00",
           "created_by": "a-test-only-id-abcd",
           "id": "5f73f540-c125-435d-8a5e-6fea05672b7a",
           "properties": null}}

{"time": "2020-02-18T16:37:09.658768+00:00",
 "type": "Item updated",
 "user": "a-test-only-id-abcd",
 "value": {"created_at": "2020-02-18T16:37:09.654917+00:00",
           "created_by": "a-test-only-id-abcd",
           "id": "5f73f540-c125-435d-8a5e-6fea05672b7a",
           "properties": {"type": "STORY"}}}

{"time": "2020-02-18T16:37:09.662897+00:00",
 "type": "Item read",
 "user": "a-test-only-id-abcd",
 "value": "5f73f540-c125-435d-8a5e-6fea05672b7a"}

From here it is not much work to optionally read the event log when the server starts up and restore the Items that were previously constructed.

Next

The API is now very solid for Item objects. We have POST to create, GET to read and PUT to update. We want to keep objects for audit purposes, so we will not be implementing DELETE (more on that later).

There are placeholders for authentication and authorisation; and a mechanism to log all user actions in a way that also enables all the stored Items to be restored by replaying the events recorded in the log.

So next we can go and implement Link objects.