Podcast

33 min read

Easy Agile Podcast Ep.12 Observations on Observability

Mon Oct 11 2021
Jared Kells
Written by Jared Kells, Senior Developer

On this episode of The Easy Agile Podcast, tune in to hear developers Angad, Jared, Jess and Jordan, as they share their thoughts on observability.

Wollongong has a thriving and supportive tech community and in this episode we have brought together some of our locally based Developers from Siligong Valley for a round table chat on all things observability.

šŸ’„ What is observability?
šŸ’„ How can you improve observability?
šŸ’„ What's the end goal?

Angad Sethi

"This was a great episode to be a part of! Jess and Jordan shared some really interesting points on the newest tech buzzword - observability""

Be sure to subscribe, enjoy the episode šŸŽ§

Listen now on:

Transcript

Jared Kells:

Welcome everybody to the Easy Agile podcast. My name's Jared Kells, and I'm a developer here at Easy Agile. Before we begin, Easy Agile would like to acknowledge the traditional custodians of the land from which we broadcast today, the Wodiwodi people of the Dharawal nation, and pay our respects to elders past, present and emerging, and extend that same respect to any aboriginal people listening with us today.

Jared Kells:

So today's podcast is a bit of a technical one. It says on my run sheet here that we're here to talk about some hot topics for engineers in the IT sector. How exciting that we've got a couple of primarily front end engineers and Angad and I are going to share some front end technical stuff and Jess and Jordan are going to be talking a bit about observability. So we'll start by introductions. So I'll pass it over to Jess.

Jess Belliveau:

Cool. Thanks Jared. Thanks for having me one as well. So yeah, my name's Jess Belliveau. I work for Apptio as an infrastructure engineer. Yeah, Jordan?

Jordan Simonovski:

I'm Jordan Simonovski. I work as a systems engineer in the observability team at Atlassian. I'm a bit of a jack of all trades, tech wise. But yeah, working on building out some pretty beefy systems to handle all of our data at Atlassian at the moment. So, that's fun.

Angad Sethi:

Hello everyone. I'm Angad. I'm working for Easy Agile as a software dev. Nothing fancy like you guys.

Jared Kells:

Nothing fancy!

Jess Belliveau:

Don't sell yourself short.

Jared Kells:

Yeah, I'll say. Yeah, so my name's Jared, and yeah, senior developer at Easy Agile, working on our apps. So mainly, I work on programs and road maps. And yeah, they're front end JavaScript heavy apps. So that's where our experience is. I've heard about this thing called observability, which I think is just logs and stuff, right?

Jess Belliveau:

Yeah, yeah. That's it, we'll wrap up!

Jared Kells:

Podcast over! Tell us about observability.

Jess Belliveau:

Yeah okay, I'll, yeah. Well, I thought first I'd do a little thing of why observability, why we talk about this and sort of for people listening, how we got here. We had a little chat before we started recording to try and feel out something that might interest a broader audience that maybe people don't know a lot about. And there's a lot of movements in the broad IT scope, I guess, that you could talk about. There's so many different things now that are just blowing up. Observability is something that's been a hot topic for a couple of years now. And it's something that's a core part of my job and Jordan's job as well. So it's something easy for us to talk about and it's something that you can give an introduction to without getting too technical. So we don't want to get down. This is something that you can go really deep into the weeds, so we picked it as something that hopefully we can explain to you both at a level that might interest the people at home listening as well.

Jess Belliveau:

Jordan and I figured out these four bullet points that we wanted to cover, and maybe I can do the little overview of that, and then I can make Jordan cover the first bullet point, just throw him straight under the bus.

Jordan Simonovski:

Okay!

Jess Belliveau:

So we thought we'd try and describe to you, first of all, what is observability. Because that's a pretty, the term doesn't give you much of what it is. It gives you a little hint, but it'll be good to base line set what are we talking about when we say what is observability. And then why would a development team want observability? Why would a company want observability? Sort of high level, what sort of benefits you get out of it and who may need it, which is a big thing. You can get caught up in these industry hot buzz words and commit to stuff that you might not need, or that sort of stuff.

Jared Kells:

Yep.

Jordan Simonovski:

Yep.

Jess Belliveau:

We thought we'd talk about some easy wins that you get with observability. So some of the real basic stuff you can try and get, and what advantages you get from it. And then we just thought because we're no going to try and get too deep, we could just give a few pointers to some websites and some YouTube talks for further reading that people want to do, and go from there. So yeah, Jordan you want to-

Jared Kells:

Sounds good.

Jess Belliveau:

Yeah. I hopefully, hopefully. We'll see how this goes! And I guess if you guys have questions as well, that's something we should, if there's stuff that you think we don't cover or that you want to know more, ask away.

Jordan Simonovski:

I guess to start with observability, it's a topic I get really excited about, because as someone that's been involved in the dev ops and SRE space for so long, observability's come along and promises to close the loop or close a feedback loop on software delivery. And it feels like it's something we don't really have at the moment. And I get that observability maybe sounds new and shiny, but I think the term itself exists to maybe differentiate itself from what's currently out there. A lot of us working in tech know about monitoring and the loading and things like that. And I think they serve their own purpose and they're not in any way obsolete either. Things like traditional monitoring tools. But observability's come along as a way to understand, I think, the overwhelmingly complex systems that we're building at the moment. A lot of companies are probably moving towards some kind of complicated distributed systems architecture, microservices, other buzz words.

Jordan Simonovski:

But even for things like a traditional kind of monolith. Observability really serves to help us ask new questions from our systems. So the way it tends to get explained is monitoring exits for our known unknowns. With seniority comes the ability to predict, almost, in what way your systems will fail. So you'll know. The longer you're in the industry, you know this, like a Java server fails in x, y, z amount of ways, so we should probably monitor our JVM heap, or whatever it is.

Jared Kells:

I was going to say that!

Jordan Simonovski:

I'll try not to get too much into-

Jared Kells:

Runs out of memory!

Jordan Simonovski:

Yeah. So that's something that you're expecting to fail at some point. And that's something that you can consider a known unknown. But then, the promise of observability is that we should be shipping enough data to be able to ask new questions. So the way it tends to get talked about, you see, it's an unknown unknown of our system, that we want to find out about and ask new questions from. And that's where I think observability gets introduced, to answer these questions. Is that a good enough answer? You want me to go any further into detail about this stuff? I can talk all day about this.

Jared Kells:

Is it like a [crosstalk 00:08:05]. So just to repeat it back to you, see if I've understood. Is it kind of like if I've got a, traditionally with a Java app, I might log memories. It's because I know JVM's run out of memory and that's a thing that I monitor, but observability is more broad, like going almost over the top with what you monitor and log so that you can-

Jordan Simonovski:

Yeah. And I wouldn't necessarily say it's going over the top. I think it's maybe adding a bit more context to your data. So if any of you have worked with traces before, observability is very similar to the way traces work and just builds on top of the premise of traces, I guess. So you're creating these events, and these events are different transactions that could be happening in your applications, usually submitting some kind of request. And with that request, you can add a whole bunch of context to it. You can add which server this might be running on, which time zone. All of these additional and all the exciters. You can throw in user agency into there if you want to. The idea of observability is that you're not necessarily constrained by high cardinality data. High cardinality data being data sets that can change quite largely, in terms of the kinds of data they represent, or the combinations of data sets that you could have.

Jordan Simonovski:

So if you want shipping metrics on something, on a per user basis and you want to look at how different users are affected by things, that would be considered a high cardinality metric. And a lot of the time it's not something that traditional monitoring companies or metric providers can really give you as a service. That's where you'll start paying insanely huge bills on things like Datadog or whatever it is, because they're now being considered new metrics. Whereas observability, we try and store our data and query it in a way that we can store pretty vast data sets and say, "Cool. We have errors coming from these kinds of users." And you can start to build up correlations on certain things there. You can find out that users from a particular time zone or a particular device would only be experiencing that error. And from there, you can start building up, I think, better ways of understanding how a particular change might have broken things. Or some particular edge cases that you otherwise couldn't pick up on with something like CPU or memory monitoring.

Angad Sethi:

Would it be fair to say-

Jared Kells:

Yeah. It's [crosstalk 00:11:02].

Angad Sethi:

Oh, sorry Jared.

Jared Kells:

No you can-

Angad Sethi:

Would it be fair to say that, so, observability is basically a set of principles or a way to find the unknown unknowns?

Jordan Simonovski:

Yeah.

Angad Sethi:

Oh.

Jess Belliveau:

And better equip you to find, one of the things I find is a lot of people think, you get caught up in thinking observability is a thing that you can deploy and have and tick a box, but I like your choice of word of it being a set of principles or best practices. It's sort of giving you some guidance around these, having good logging coming out of your application. So structured logs. So you're always getting the same log format that you can look at. Tracing, which Jordan talked a little bit about. So giving you that ability to follow how a user is interacting with all the different microservices and possibly seeing where things are going wrong, and metrics as well. So the good thing with metrics is we're turning things a bit around and trying to make an application, instead of doing, and I don't want to get too technical, black box monitoring, where we're on the outside, trying to peer in with probes and checks like that. But the idea with metrics is the application is actually emitting these metrics to inform us what state it is in, thereby making it more observable.

Jess Belliveau:

Yeah, I like your choice of words there, Angad, that it's like these practices, this sort of guide of where to go, which probably leads into this next point of why would a team want to implement it. If you want to start again, Jordan?

Jordan Simonovski:

Yeah, I can start. And I'll give you a bit more time to speak as well, Jess in this one. I won't rant as much.

Jess Belliveau:

Oh, I didn't sign up for that!

Jordan Simonovski:

I think why teams would want it is because, it really depends on your organization and, I guess, the size of the teams you're working in. Most of the time, I would probably say you don't want to build observability yourself in house. It is something that you can, observability capabilities themselves, you won't achieve it just by buying a thing, like you can't buy dev ops, you can't buy Agile, you can't buy observability either.

Jared Kells:

Hang on, hang on. It says on my run sheet to promote Easy Agile, so that sounds like a good segue-

Jess Belliveau:

Unless you want to buy it. If you do want to buy Agile, the [crosstalk 00:13:55] in the marketplace.

Jared Kells:

Yeah, sorry, sorry, yeah! Go on.

Jordan Simonovski:

You can buy tools that make your life a lot easier, and there are a lot of things out there already which do stuff for people and do surface really interesting data that people might want to look at. I think there are a couple of start ups like LightStep and Honeycomb, which give you a really intuitive way of understanding your data in production. But why you would need this kind of stuff is that you want to know the state of your systems at any given point in time, and to build, I guess, good operational hygiene and good production excellence, I guess as Liz Fong-Jones would put it, is you need to be able to close that feedback loop. We have a whole bunch of tools already. So we have CICD systems in place. We have feature flags now, which help us, I guess, decouple deployments from releases. You can deploy code without actually releasing code, and you can actually give that power to your PM's now if you want to, with feature flags, which is great.

Jordan Simonovski:

But what you can also do now is completely close this loop, and as you're deploying an application, you can say, "I want to canary this deployment. I want to deploy this to 10% of my users, maybe users who are opted in for Beta releases or something of our application, and you can actually look at how that's performing before you release it to a wider audience. So it does make deployments a lot safer. It does give you a better understanding of how you're affecting users as well. And there are a whole bunch of tools that you can use to determine this stuff as well. So if you're looking at how a lot of companies are doing SRE at the moment, or understanding what reliable looks like for their applications, you have things like SLO's in place as well. And SLO's-

Jared Kells:

What's an SLO?

Jordan Simonovski:

They're all tied to user experiences. So you're saying, "Can my user perform this particular interaction?" And if you can effectively measure that and know how users are being affected by the changes you're making, you can easily make decisions around whether or not you continue shipping features or if you drop everything and work on reliability to make sure your users aren't affected. So it's this very user centric approach to doing things. I think in terms of closing the loop, observability gives us that data to say, "Yes, this is how users are being affected. This is how, I guess the 99th percentile of our users are fine, but we have 1% who are having adverse issues with our application." And you can really pinpoint stuff from there and say, "Cool. Users with this particular browser or this particular, or where we've deployed this app to," let's say if you have a global deployment of some kind, you've deployed to an island first, because you don't really care what happens to them. You can say, "Oh, we've actually broken stuff for them." And you can roll it back before you impact 100% of your users.

Jared Kells:

Yeah. I liked what you said about the test. I forgot the acronym, but actually testing the end user behavior. That's kind of exciting to me, because we have all these metrics that are a bit useless. They're cool, "Oh, it's using 1% CPU like it always is, now I don't really care," but can a user open up the app and drag an issue around? It's like-

Jess Belliveau:

Yeah, that's a really great example, right?

Jared Kells:

That's what I really care about.

Jess Belliveau:

The 1% CPU thing, you could look at a CPU usage graph and see a deployment, and the CPU usage doesn't change. Is everything healthy or not? You don't know, whereas if you're getting that deeper level info of the user interactions, you could be using 1% CPU to serve HTTP500 errors to the 80% of the customer base, sort of thing.

Angad Sethi:

How do you do that? The SLO's bit, how do you know a user can log in and drag an issue?

Jordan Simonovski:

Yeah. I think that would come with good instrumenting-

Angad Sethi:

Good question?

Jordan Simonovski:

Yeah, it comes down to actually keeping observability in mind when you are developing new features, the same way you would think about logging a particular thing in your code as you're writing, or writing test for your code, as you're writing code as well. You want to think about how you can instrument something and how you can understand how this particular feature is working in production. Because I think as a lot of Agile and dev ops principles are telling us now is that we do want our applications in production. And as developers, our responsibilities don't end when we deploy something. Our responsibility as a developer ends when we've provided value to the business. And you need a way of understanding that you're actually doing that. And that's where, I guess, you do nee do think about observability with a lot of this stuff, and actually measuring your success metrics. So if you do know that your application is successful if your user can log in and drag stuff around, then that's exactly what you want to measure.

Jared Kells:

I think that we have to build-

Jordan Simonovski:

Yeah?

Jared Kells:

Oh, sorry Jordan.

Jordan Simonovski:

No, you go.

Jared Kells:

I was just going to say we have to build our apps with integration testing in mind already. So doing browser based tests around new features. So it would be about building features with that and the same thing in mind but for testing and production.

Jess Belliveau:

Yeah and the actual how, the actual writing code part, there's this really great project, the open telemetry project, which provides all these sort of API's and SDK's that developers can consume, and it's vendor agnostic. So when you talk about the how, like, "How do I do this? How do I instrument things?" Or, "How do I emit metrics?" They provide all these helpful libraries and includes that you can have, because the last thing you want to do is have to roll this custom solution, because you're then just adding to your technical debt. You're trying to make things easier, but you're then relying on, "Well I need to keep Jared Kells employed, because he wrote our log in engine and no one else knows how it works.

Jess Belliveau:

And then the other thing that comes to mind with something like open telemetry as well, and we talked a bit about Datadog. So Datadog is a SaaS vendor that specializes in observability. And you would push your metrics and your logs and your traces to them and they give you a UI to display. If you choose something that's vendor agnostic, let's just use the example of Easy Agile. Let's say they start Datadog and then in six months time, we don't want to use Datadog anymore, we want to use SignalFx or whatever the Splunk one is now.

Jordan Simonovski:

I think NorthX.

Jess Belliveau:

Yeah. You can change your end point, push your same metrics and all that sort of stuff, maybe with a few little tweaks, but the idea is you don't want to tie in to a single thing.

Jordan Simonovski:

Your data structures remain the same.

Jess Belliveau:

Yeah. So that you could almost do it seamlessly without the developers knowing. There's even companies in the past that I think have pushed to multiple vendors. So you could be consuming vendor A and then you want to do a proof of concept with vendor B to see what the experience is like and you just push your data there as well.

Jared Kells:

Yeah. I think our coupling to Datadog will be I all the dashboards and stuff that we've made. It's not so much the data.

Jess Belliveau:

Yeah. That's sort of the big up sell, right. It's how you interact. That's where they want to get their hooks in, is making it easier for you to interpret that data and manipulate it to meet your needs and that sort of stuff.

Jordan Simonovski:

Observability suggests dashboards, right?

Jess Belliveau:

Yeah, perhaps. You used this term as well, Jordan, "production excellence." And when we talk about who needs observability, I was thinking a bit about that while you were talking. And for me, production excellence, or in Apptio we call it production readiness, operational readiness and that sort of stuff is like we want to deploy something to production like what sort of best practices do we want to have in place before we do that? And I think observability is a real great idea, because it's helping you in the future. You don't know what problems you're going to have down the line, but you're equipping your teams to be able to respond to those problems easily. Whereas, we've all probably been there, we've deployed code of production and we have no observability, we have a huge outage. What went wrong? Well, no one knows, but we know this is the fix, and it's hard to learn from that, or you have to learn from that I guess, and protect the user against future stuff, yeah.

Jess Belliveau:

When I think easy wins for observability, the first thing that really comes to mind is this whole idea of structured logging, which is really this idea that your application is you're logging, first of all. Quite important as a baseline starting point, but then you have a structured log format which lets you programmatically pass the logs as well. If you go back in time, maybe logging just looked like plain text with a line, with a timestamp, an error message. Whatever the developer decided to write to the standard out, or to the error file or something like that. Now I think there's a general move to having JSON, an actual formatted blob with that known structure so you can look into it. Tracing's probably not an easy win. That's a little bit harder. You can implement it with open telemetry and libraries and stuff. Requires a bit more understanding of your code base, I guess, and where you want tracing to fire, and that sort of stuff, parsing context through, things like that.

Jordan Simonovski:

I think Atlassian, when you probably just want to know that everything is okay. At a fairly superficial level. Maybe you just want to do some kind of up time on a trend. And then as, I guess, your code might get more complex or your product gets a bit more complex, you can start adding things in there. But I think actually knowing or surfacing the things you know might break. Those would probably be your quickest wins.

Jess Belliveau:

Well, let's mention some things for further reading. If you want to go get the whole picture of the whole, real observability started to get a lot of movement out of the Google SRE book from a few years ago. The Google SRE stuff covers the whole gamut of their soak reliability engineering practice, and observability is a portion of that, there's some great chapters on that. O'Reilly has an observability book, I think, just dedicated to observability now.

Jordan Simonovski:

I think that's still in early release, if people want to google chapters.

Jess Belliveau:

The open telemetry stuff, we'll drop a link to that I think that's really handy to know.

Angad Sethi:

From [inaudible 00:26:12], which is my perspective, as a developer, say I wanted to introduce cornflake use Datadog at Easy Agile. Not very familiar, I'm not very comfortable with it. I know how to navigate, but what's a quick way for me to get started on introducing observability? Sorry to lock my direct job or at my workplace.

Jordan Simonovski:

I would lean, I could be biased here. Jess correct me or give your opinion on this, I would lean heavily towards SLO's for this. And you can have a quick read in the SRE-

Jess Belliveau:

What does SLO stand for, Jordan?

Jordan Simonovski:

Okay, sorry. Buzz words! SLO is a service level objective, not to be confused with service level agreement. An agreement itself is contractual and you can pay people money if you do breach those. An SLO is something you set in your team and you have a target of reliability, because we are getting to the point where we understand that all systems at any point in time are in some kind of degraded state. And yeah, reliability isn't necessarily binary, it's not unreliable or reliable. Most of the time, it's mostly reliable and this gives us a better shared language, I guess. And you can have a read in the SRE handbook by Google, which is free online, which gives you a pretty good understanding of Datadog.

Jordan Simonovski:

I think the last time I used it had a SLO offering. But I think like I was mentioning earlier, you set an SLO on particular functionalities or features of your application. You're saying, "My user can do this 99% of the time," or whatever other reliability target you might want to set. I wouldn't recommend five nines of reliability. You'll probably burn yourself out trying to get there. And you have this target set for yourself. And you know exactly what you're measuring, you're measuring particular types of functionality. And you know when you do breach these, users are being affected. And that's where you can actually start thinking about observability. You can think about, "What other features are we implementing that we can start to measure?" Or, "What user facing things are we implementing that we can start to measure?"

Jordan Simonovski:

Other things you could probably look at are, I think they're all covered in the book anyway, data freshness in a way. You want to make sure the data users are being displayed is relatively fresh. You don't want them looking at stale data, so you can look at measuring things like that as well. But you can pretty much break it down into most functionalities of a website. It's no longer like a ping check, that you're just saying, "Yes, HTTP, okay. My application is fine." You're saying, "My users are actually being affected by things not working." And you can start measuring things from there. And that should give you a better understanding, or a better idea, at least, of where to start with what you want to measure and ow you want to measure it. That would be my opinion on where to get started with this if you do want to introduce it.

Jared Kells:

We're going to talk a little bit about state and how with some of these, like our very front end heavy applications that we're building, so the applications we build just basically run inside the browser and the traditional state as you would think about it, is just pulling a very simple API that writes some things into the database with some authentication, and that sort of stuff. So in terms of reliability of the services, it's really reliable. Those tiny API's just never have problems, because it's just so simple. And well, they've got plenty of monitoring around it. But all our state is actually, when you say, "Observe the state of the system," for the most part, that's state in a browser. And how do we get observability into that?

Jess Belliveau:

A big thing is really, there's not one thing fits all as well. When we talk about the SLO stuff as well, it's understanding what is important to not so much maybe your company but your team as well. If you're delivering this product, what's important to you specifically? So one SLO that might work for me at Apptio probably isn't going to work for Easy Agile. This is really pushing my knowledge, as well, of front end stuff, but when we say we want to observe the state as well, we don't necessarily mean specifically just the state. You could want to understand with each one of those API's when it's firing, what the request response time is for that API firing. So that might be an important metric. So you can start to see if one of those APIs is introducing latency, and so your user experience is degraded. Like, "Hey when we were on release three, when users were interacting with our service here, it would respond in this percentile latency. We've done a release and since then, now we're seeing it's now in this percentile. Have we degraded performance performance?" Users might not be complaining, but that could be something that the team then can look into, add to a sprint. Hey, I'm using Agile terms now. Watch out!

Jared Kells:

That's a really good example, Jess. Performance issues for us are typically not an API that's performing poorly. It's something in this very complicated front end application is not running in the same order as it used to, or there's some complex interaction we didn't think of, so it's requesting more data than expected. The APIs are returning. They're never slow, for the most part, but we have performance regressions that we may not know about without seeing them or investigating them. The observability is really at the individual user's browser level. That makes sense? I want to know how long did it take for this particular interaction to happen.

Jess Belliveau:

Yeah. I've never done that sort of side of things. As well, the other thing I guess, you could potentially be impacted in as well as then, you're dealing with end user manifestations as well. You could perceive-

Jared Kells:

Yeah sure.

Jess Belliveau:

... Greater performance on their laptop or something, or their ISP or that sort of stuff. It'd be really hard to make sure you're not getting noise from that sort of thing as well.

Jordan Simonovski:

Yeah. There are tools like Sentry, I guess, which do exist to give you a bit more of an understanding what's happening on your front end. The way Sentry tends to work with JavaScript, is you'll upload a minified map of your JS to Sentry, deploy your code and then if something does break or work in a fairly unexpected way, that tends to get surfaced with Sentry will tell you exactly which line this kind of stuff is happening on, and it's a really cool tool for that company stuff. I don't know if it'd give you the right type of insights, I think, in terms of performance or-

Jared Kells:

Yeah, we use a similar tool and it does work for crashes and that sort of thing. And on the observability front, we log actions like state mutations in side the front end, not the actual state change, but just labels that represent that you updated an issue summary or you clicked this button, that sort of thing, and we send those with our crash reports. And it's super helpful having that sort of observability. So I think I know what you guys are talking about. But I'm just [crosstalk 00:35:25], yeah.

Jess Belliveau:

Yeah, that's almost like, I guess, a form of tracing. For me and Jordan, when we talk about tracing, we might be thinking about 12 different microservices sitting in AWS that are all interacting, whereas you're more shifting that. That's sort of all stuff in the browser interacting and just having that history of this is what the user did and how they've ended up-

Jared Kells:

In that state.

Jess Belliveau:

In that state, yeah.

Jordan Simonovski:

I guess even if you don't have a lot of microservices, if you're talking about particular, like you're saying for the most part your API requests are fine but sometimes you have particularly large payloads-

Jared Kells:

We actually have to monitor, I don't know, maybe you can help with this, we actually should be monitoring maybe who we're integrating with. It's actually much more likely that we'll have a performance issue on a Xero API rather than... We don't see it, the browser sees it as well, which is-

Jordan Simonovski:

Yeah, and tracing does solve all of those regressions for you. Most tracing libraries, like if you're running Node apps or whatever on your backend. I can just tell you about Node, because I probably have the most experience writing Node stuff. You pretty much just drop in Didi trace, which is a Datadog library for tracing into your backend and your hook itself into all of, I think, the common libraries that you'll tend to work with, I think. Like if you're working for express or for a lot of just HADP libraries, as well as a few AWS services, it will kind of hook itself into that. And you can actually pinpoint. It will kind of show you on this pretty cool service map exactly which services you're interacting with and where you might be experiencing a regression. And I think traces do serve to surface that information, which is cool. So that could be something worth investigating.

Jess Belliveau:

It's funny. This is a little bit unrelated to observability, but you've just made me think a bit more about how you're saying you're reliant on third party providers as well. And something I think that's really important that sometimes gets missed is so many of us today are relying on third party providers, like AWS is a huge thing. A lot of people writing apps that require AWS services. And I think a lot of the time, people just assume AWS or Jira or whatever, is 100% up time, always available. And they don't write their code in such a way that deals with failures. And I think it's super important. So many times now I've seen people using the AWS API and they don't implement exponential back off. And so they're basically trying to hit the AWS API, it fails or they might get throttled, for example, and then they just go into a fail state and throw an error to the user. But you could potentially improve that user experience, have a retry mechanism automatically built in and that sort of stuff. It doesn't really tie into the observability thing, but it's something.

Jared Kells:

And the users don't care, right? No one cares if it's an AWS problem. It's your problem, right, your app is too slow.

Jess Belliveau:

Well, they're using your app. Exactly right. It reflects on you sort of thing, so it's in your interest to guard against an upstream failure, or at least inform the user when it's that case. Yeah.

Jared Kells:

Well, I think we're going to have to call it, this podcast, because it was an hour ago. We had instructed max 45 minutes.

Jess Belliveau:

We could just keep going. We might need a part two! Maybe we can request [cross talk 00:39:21].

Jared Kells:

Maybe! Yeah.

Jess Belliveau:

Or we'll just start our own podcast! Yeah.

Angad Sethi:

So what were your biggest learnings today, given it's been Angad and I are just learning about observability, Angad what was your biggest learning today about observability? My biggest learning was that observability does not equal Datadog. No, sorry! It was just very fascinating to learn about quantifying the known unknowns. I don't know if that's a good takeaway, but...

Jess Belliveau:

Any takeaway is a good takeaway! What about you, Jared?

Jared Kells:

I think, because I we were going to talk about state management, and part of it was how we have this ability, at the moment to, the way our front ends are architected, we can capture the state of the app and get a customer to send us their state, basically. And we can load it into our app and just see exactly how it was, just the way our state's designed. But what might be even cooler is to build maybe some observability into that front end for support. I'm thinking instead of just having, we have this button to send us out your support information that sends us a bunch of the state, but instead of console logging to the browser log, we could be console logging, logging in our front end somewhere that when they click, "send support information," our customers should be sending us the actions that they performed.

Jared Kells:

Like, "Hey there's a bug, send us your support information." It doesn't have to be a third party service collecting this observability stuff. We could just build into our... So that's what I'm thinking about.

Jess Belliveau:

Yeah, for sure. It'll probably be a lot less intrusive, as well, as some of the third party stuff that I've seen around.

Jared Kells:

Yeah. It's pretty hard with some of these integrations, especially if you're developing apps that get run behind a firewall.

Jess Belliveau:

Yeah

Jared Kells:

You can't just talk to some of these third parties. So yeah, it's cool though. It's really interesting.

Jess Belliveau:

Well, I hope someone out there listening has learned something, and Jordan and I will send some links through, and we can add them, hopefully, to the show notes or something so people can do some more reading and...

Jared Kells:

All thanks!

Jess Belliveau:

Thanks for having us, yeah.

Jared Kells:

Thanks all for your time, and thanks everybody for listening.

Jordan Simonovski:

Thanks everyone.

Angad Sethi:

That was [inaudible 00:41:55].

Jess Belliveau:

Tune in next week!

    Subscribe to our blog

    Keep up with the latest tips and updates.

    Subscribe