In this episode, Alex Thiessen talks about monitoring, logging, and tracing .NET applications and Azure. He explains the concepts of monitoring, logging, and tracing, and how they help developers troubleshoot and identify issues in their applications. He also discusses the benefits of using tools like Application Insights in Azure for monitoring and instrumentation. Alex emphasizes the importance of including monitoring and instrumentation as part of the development process to ensure the application's performance and usability.
[00:00:02]
Unknown:
This week on developer weekly.
[00:00:05] Unknown:
Would you log that? Well, that maybe those calculations happen thousands of time per hour or per second even. So that that doesn't really make sense to store those in a log. You can never look at them
[00:00:19] Unknown:
Yeah. Hey, guys. I've been using Windows 10 for years now, and I recently took the time to learn how to be more productive with There are lots of shortcuts and tools in Windows 10 that helped me throughout the day. Do you also want to be more productive with Windows 10? Then check out my new Udemy course called Windows 10 Productivity Booster. You can check it out at azureberry.com/windows. That is azureberry.com/windows. Welcome to another episode of Developer Weekly. This week, I'm talking with Alex Thissen about monitoring, logging, and tracing .net applications and Azure stuff in Azure.
Alex is an application development enthusiast Since the late nineties and works as an architect, lead developer, and mentor at large enterprises and small companies. He, also does all sorts of other stuff and speaks a lot at conferences. Welcome, Alex.
[00:01:16] Unknown:
Hi, Barry.
[00:01:17] Unknown:
Hi. Hey, Figgs. To be here. Yeah. Good to have you on the show. It's, yeah. I had you on the list for a long time, actually. I thought this is the guy because, back when I had, helped out with the user group .net site, we had you on as a speaker talking about, containers. You remember that? It's quite a while back. Yeah. It was a very good talk, was that. So I I enjoyed that. Yeah. I always enjoy your stuff. You you're You can explain technology concepts very well, and that's what we need in this world.
[00:01:50] Unknown:
Okay. Well, the what that that might be thanks to a couple of years of training that I did where I focus on teaching developers and architects to, to learn all the new stuff on the Microsoft platform.
[00:02:03] Unknown:
Yeah. Yeah. Because there's always new stuff to learn. And, even if it's not new, there's so much stuff nowadays that there's you know, you can't learn everything. So, today, I wanted to talk about, something very important like monitoring, logging, and tracing. And that basically comes down to, You know, as a developer or software person, whatever you do, even if you're a DevOps person, you spend a lot of time, looking at stuff that goes wrong In your applications, right? As a developer, usually I try to make stuff. Like, let's say it's a website or something.
And I don't know. Maybe 70% of the time I deal with, oh, why is it not working as I want it to work? And that's even in development time. And that becomes worse when, things don't work as expected when they are actually running in production. Because there, you know, I'm not debugging. I'm not, I'm not I I don't I'm not on that machine, basically. So it's it's more difficult. So Yeah. Yeah. As a software person, we spend a lot of time in that world of troubleshooting and dealing with things that go wrong. And I guess, to make that a bit easier and a bit better and to make sure that we have we don't have no reproduction bugs, so things that we Cannot reproduce.
We need to make sure that we have monitoring, logging and, tracing. So These, these concepts, maybe we can, start by defining what that actually is. Because like the, we have 3 terms here, monitoring, logging, and tracing. What are those, and what is the difference between those? Can you explain that?
[00:03:48] Unknown:
Oh, yeah. Sure. Well, I I do feel your pain that you're using. You're like, oh, it worked on my machine and now it's running somewhere else and it doesn't work anymore, but I don't have the luxury of of debugging it in a in a development environment and and you have no clue. And then sometimes you resort to the poor man's debugging with sprinkling all these log messages here there just to get a bit of an understanding what is going on because I'm in the blinds on the other side. Yeah. Well, I think that that, in overall that I would call that instrumentation. And I I like to think of a plane where You want to have all these, altitudes, air pressure, velocity, altitude, those kind of things. You want to know that.
Because otherwise, if you don't, you are in the blind and you don't know where you're going and where you are, and that's not a not a good space to be in. Well, so what what do you well, what can you do in your applications to get bit more of an understanding what is happening inside? And you mentioned the logging, then there's the tracing. And I also would like to think about, metrics as well. Ah. And with those, you can start monitoring during your system because it's instrumented with those 3 aspects. And I think they're kind of different.
To start with the logging because we're most familiar with that one. That's, sort of like a record keeping. It's a log with a list of things that have happened and that were noteworthy, because you want to look back at them later on. There was something significant and you have it logged, so you could say, okay. At this time, this happens. And that that gives you an understanding of important things that happen in your application. Important enough to keep those locked around for a long time, sometimes even a couple of years. The tracing is is, well, it's almost similar, but tracing allows you to, see certain points in your codes. And it it sort of resembles logging, but logging is send out to a log provider or a log source where it's stored, and those can be multiples.
With tracing, what you normally do is you You, have this snippets of information that you send out just to give a bit of a sign of life with some contextual information. And by stitching those together, you can see, you can trace, or you can follow the flow of your execution in the your program. And it serves a different purpose because, tracing is there to help you understand why things are going wrong. For example, in case of Tracing, you could say, you have a, calculate, mortgage function. And then inside, you could trace the the input variables just so you can see what what is actually Right. What what did you receive there?
Would you log that? Well, that maybe those calculations happen, thousands of time per hour or per second even. So that that doesn't really make sense to store those in a log or you would never look at them again. But during a debug session or, you know, troubleshooting, You would want to know, okay, I'm turning on. That's a significant thing as well. You would turn on tracing, and then you can start listening to the trace information. And it's high volume. There there's a lot of trace information. You would get overwhelmed with it. But you only do that in certain scenarios. So you turn it on, You start listening, and you see all those things flashing by. And then you're like, this is what is actually going on. And then you turn it off, and you have your repo without actually having to do, a debugging.
That's that's the the the place for tracing. And it allows you to flow, the follow the flow of your application. And if you do distributed tracing, you can even see that across network hops from 1 service to the other because there's a, well, there's a w three c, formatted tracing, and each participant in the, the distributed, call stack. If they adopt such a, such a standard as w three three c tracing, Then you can, correlate them all together, and you can even see, okay, we're going from the websites and these kind of things happened, and then we jump to a web API high and the other things happened, and they would jump back to another one. And even if you fan out across multiple services, you could see everything, as a form of telemetry stitched together, but it's also distributed tracing.
And then you have the the and the metrics are numeral information that you send out. And, usually, it's about performance of your application in the broadest sense. It could be, like, the the Windows performance counters that those are metrics as well or Linux metrics on memory consumption, Disc reads, well, the the network traffic that is happening, those are, machine metrics, but your application could also, send out metrics. For example, the number of conversions, from a, shopping basket on your web shop when there has somebody actually bought something or added something to it. You can you can see these counters that say, okay. The numbers of items being added to shopping basket at this point in time. And if if it's a Black Friday, you would probably see that spike.
And you see, oh, something really good is happening. But, also, if you all of a sudden see it dropping to 0, Yeah. That's probably not a good sign on Black Friday. So you can also see when things are going wrong, and that's the instrumentation where the plane would say, okay, you're on a collision course with a mountain. Maybe you should steer left or right or go up. This is where you could see What how is my my application doing? And that's that's maybe the 4th part. Some some people like to combine the the the the deep knowledge of your system, and, formulated in in terms of health endpoints.
And then what you do is instead of monitoring on the outside and looking at the metrics and saying, okay. If this happens, it's it's not a good sign. You can have the application itself indicating, I'm not doing well, so I'm unhealthy or I'm degraded. An example, if an application is reading from a certain queue and the the queue size is building up, That could be an indication that, the service is being overwhelmed by all the incoming messages on the queue. So we could say, I'm I'm unhealthy because I can't keep up. I the so the the system is getting unhealthy. And that allows you to, know before things are really going wrong that they are about to go wrong.
Yeah. Yeah. And with all of these combined, you have very rich patient, record keeping for for long term periods following the flow of replication, seeing the, the numerical things to, that that are happening. So you can see averages, trend lines that that's good for graphs. Spring is about, your system itself indicating if it's a thumbs up or halfway or thumbs down. And that that allows you to do things like, the monitoring that you mentioned, then we can start monitoring. Because if we build a dashboard with particular indicators that say, this is good. This is not so good. These are the, the the the graphs.
Then, you can look into your system, so to speak, even though you're watching on the outside. It's like it's like a doctor that that that would, do all these wires on your heart for to see your heart pulses and that measure the the the oxygen in your blood or your blood pressure. It's kind of like that. We're on the outside, but we can tell a lot about your health.
[00:11:51] Unknown:
Yeah. Yeah. Ah, okay. So that's the difference between those things. So, I remember that, there was an application once. It wasn't that that great of an application. It was a, what do you call it? A brownfield application, so an existing application I needed to work on. And they had quite some performance problems. And we really didn't know where they came from, and they only happened on production as well. So what we did was we, built in these, log messages. Like, hey, I'm now at this, and this is the timestamp. And, hey, now I'm at this method and, 2nd has passed. And now I'm here and this took, this long. And then try to log out Those things in in production, which was very, very painful.
So For a problem like that that happens in production, is there a better way for that? What what would you use for that?
[00:12:48] Unknown:
Okay. Well, the the, what what you did was using the, the the log messages. Like I said, the poor man's debugging. Right? It's it's like watching, trying to figure out what the movie is by having these, screenshots of of the Yeah. Of every, every now and again, the light flashes on, and then you can see what is there. And then the light goes over. You're in the line, and then it it there, and you have to make, well, mental exercises to know to think what happened inside based on the code, and it's it's a lot of guessing. The the the tracing might help you there because it gives you much richer information even, if it's not being logged, and it's also better for performance.
Well, couple of things that you can do in that case. If if things go bad in production only, so it's not reproducible in a similar environment. Then I would say you could do a couple of things. First of all, you can, do snapshot debugging, which, instructs if that that is if you're using, the the dot net core stack or the the the dot net stack in general. Yeah. You can instruct your application to, give a snapshot and that you can load inside your visual studio debugger, and you can debug at that point in time. So it's, like, taking a small part of the movie and being able to go back and forth there and inspect variables and see what is happening.
Yeah. A more rude, rude version of that is doing a, a sort of a memory dump where you build a a dump image and that you can also load with tools that, that are called, SOS with windDBG is the, the the tool surrounding it. And this as OS, that's a sign of strike. It used to be an internal tool strike by Microsoft and they build a public version, some of strike. So you will find a lot of these SOS things, that that allow you to, open up a, a memory dump and inspect everything inside from managed to unmanaged codes. And then where there's a lot of tutorials, there's 1 I have to think of the name of the the person that did it.
Let me think. I'll get back to that later, but it it's, a woman from Norwegian. She's a support engineer at Microsoft, and she has a very nice log with tutorials on how to use the tooling and how you can inspect the the stack, the heap, all the threads to do, look at, for example, deadlocks to look at, places where you have, memory leaks, objects that are locked and can't be, can't be disposed. A lot of things that you can get out of that tool. And, there's commercial tools as well that go even beyond that. But those those allow you to, get a lot of information. For example, if you have unexpected crashes, Then then you can do a, a memory dump on the crash. You can instruct the machine to do so.
Load the, the the memory dump, use the tooling, and start fiddling around. And with a little bit of practice, You will be able to do a sort of a post mortem analysis which can help. But the snapshot debugger is there, as an alternative. Yeah. And and that might help you in a situation where you say, I I have no I have no repro bug, other than on production.
[00:16:32] Unknown:
Yeah. Snapshot debuggers, really good tip there. I did it, in a different way, once Where I actually attach the debugger to a production process. That's not a very good idea because then when you hit a Breakpoint, the whole website stops.
[00:16:49] Unknown:
Yeah. Yeah. You The process stops. From processing and, the just run the single one. Now that's that's one side effect. And probably if you want to do some some useful debugging, you might have had to put a So you're not doing a release build, but a debug build on production. Well, if if push comes to shove, that might be the thing you have to do. And then But did you find the bug eventually? Yeah. Yeah. Did find the bug, yeah, and solved it. Alright. Okay. Would it have been something that you could have fixed, in in another way in retrospect.
[00:17:29] Unknown:
Did you find that, if I only had done this, I would have found the bug earlier and on a production only? No. It was really something that only happened with production data. So what could have helped is If if we would have had, environments that would actually contain parts of the production data. So the the the problem there was, And this happens a lot. I see at least, in in my consulting is that, the environment, sort of development, tests, acceptance, production, or whatever environments you have Usually are very, very different in infrastructure. So the development environment often just has a little bit of data that the developers themselves just put in there In a database, let's say, that you test stuff out. And then usually things seem to work fine, but then come in production where you have all sorts weird data. You just put in lots of weird stuff or maybe a lot of data. Things sometimes work differently than, than expected.
-So I think it's good practice to also take the data. So not only the things where your services run and make those the same in those environments, at least As similar as you can do, but also the data itself. Try to make snapshots of the database of production -And then use those in acceptance and tests as well. And potentially, you should mask some, some production data there. Let's say if you have credit card information or other stuff, Personal information that should only be in production and, that developers and other people shouldn't be able to read. So, Okay.
So these processes that you've just talked about, instrumenting your application, that sounds a bit painful, actually. I think, Let's say we're we're running our application in Azure. Is there, another very easy way maybe to to instrument and monitor our application when we do that.
[00:19:22] Unknown:
That's a good one. Let me think. Well well, in And, in Azure, you get a lot of, internal information from the resources themselves. It's it's more at the, the control plane where things happen that that, allow you to, inspect your application at another level. That that's, that's one thing, I would I would say. The other thing is that in, in Azure, there are, Azure resources that will help you out with this. For example, if you have application insights, It, there there's only a little bit you have to do to add instrumentation, from application insights to your application parts.
And then you will find that there's, a very nice, sort collaboration between your code and the Azure resources. As an example, if you host things on, an an Azure web app, calling into an Azure web API also as an Azure or maybe in and inside of an AKS cluster and then make a call to Azure SQL or some other data storage resource. You will find that if you use application insights and with all the correlation information it's being sent across that you can get really rich application maps that allow you to inspect the the the way your application is hooked together and how calls are going from one place to the other. It's sort of a, well, yeah, getting a visual imprint of all the calls going back and forth with measurement of time, but also being able to pinpoint the places where the the failures are happening because the application inside, if you go to the portal, It will, surface those errors, and you can drill down into them. And you can see the, all the the the traces, so things calling things.
And and then all of a sudden here it breaks and then it it usually escalates that that the all the all the other stuff is broken as well, but you can pinpoint it to the root cause and then, inspect. And usually, there's even the the exception information that's also being logged. And that that comes because that that that works because Azure is built that way. It It uses that internally as well. It sends out all this information to, Azure log analytics. That's that's to back end storage and the Azure monitor can look at the the log analytics.
And it's it's also the, the .net runtime, the .net core runtime also using all these metrics and sending out information about, its health, the the number of requests, the requests being throttled, and that that's also already there for you, and you don't have to do anything about it too. So is it painful? Well, you get a lot for free if you have a, a Microsoft, homogeneous stack. Then then it will it will usually work very well together. If you make trips to other things outside, you might have to do a little bit more work there. But I still think it's a good practice that if you're doing, developments, for for let's say, cloud applications, and you want to do more of the DevOps style of working then it but you you build it and you need to run it as well. And I think that as a team, you want to have all these things in place.
So I consider it a best practice to put that in. Otherwise, it's not ready to be run into production because How will you be able to get a feedback loop? And it's not just how is it doing, but sometimes it's even how is it being used by your customers because we build this new feature. And as it turns out, 2 months down the line, nobody actually used it because we can see it from the usage metrics that nobody was using the advanced search that the business had the customers wanted to have. And then you you can have, or you can see that there's there's a lot of use for it. And then you can say, ah, good. We are going to re reprioritize our backlog because this is a good thing. We need to invest more in this area.
[00:23:42] Unknown:
Yeah. I couldn't agree more with that. So, yeah, Application insights gives you a lot for free. What I like there a lot is that, You know, you can enable that for your application in the Azure portal where you say, right, I want application insights. And then it will add an extension, let's say, to an Azure app service So that it will monitor, everything that's going on in the infrastructure. And then in your code, if you have Visual Studio, for instance, and you are in the dot net, ecosystem. Then you can also simply there say, well, add Application Insights to my application, to my .net or .netcore application. And then it adds those packages and a couple of lines of code, and then it just automatically basically works, which is great.
-Yeah. And to your point there, yeah, I totally agree. You know, a feature really isn't done until It is used as expected. So, you know, it's not only that that there's no bugs, even if there's no bugs. It could be that users aren't using it, like you say. It could be that users, don't seem to be able to use it as they want to. Maybe, you know, maybe you're you designed it weird, and they just can't find the button that they need to click on. And and you notice in your monitoring and tracing that they keep clicking on something else Or on the back button or something, and that that's weird around there. So it it should be in the definition of done that you To, that it works as expected.
And if if it isn't, then, like you say, you have that feedback loop, and you can take that back to product management or whoever Who then puts it back on the backlog or prioritizes it? Yeah. That's awesome.
[00:25:23] Unknown:
It it it it's it's the same, I think, with, if you're doing DevOps. People like to say we're doing biz DevOps, but I think you should always do DevOps together with the business because you're doing it for the business. It should be co creation. Some people say we do biz, DevSecOps because we need security in there. I'm like, how can you build something that doesn't do dirty upfront. And then I think the same goes, for the the the the the monitoring capabilities. How can you do ops if you don't have monitoring? And so it it it all boils down to, your definition of done like you said that that it should be there.
It should be secure, compliant, instrumented.
[00:26:07] Unknown:
Yeah. Exactly. So, we've just talked about application insights, which is a great way to monitor stuff that runs in, let's say, app servers and other services within Azure. Now, you know, I like Azure. You also like to run stuff in containers. Right? Like, let's say Kubernetes or something. How does that work? How can you instrument and monitor stuff there in in a similarly easy way as with Application Insights? Does Application Insights work for containers, or do you use something else?
[00:26:41] Unknown:
That that would that would work, and it will the the the the containers are, I think, once you discover the the the the power of containers, but also the ease of use, then, Well, I think you you'll see that it's hard to think of a, a new solution not using, containers. But it's it's not as, one size fits all or a solve everything, piece of technology. But I can I can illustrate a couple of things there? First of all, the what what makes containers great is it's that it's, it's sort of a application virtualization. It puts everything you need to run an application inside of a, well defined set of files. That's the container image, where if you have a machine that is able to, to host a container instance and it has the right kernel underneath. A Linux container image needs to run a Linux kernel and a Windows container image needs to run on Windows, then you're good to go. If the kernel is compatible with the, the assumption when the, container image was built, It will be it it will run, and it will run because everything is there. And then to illustrate is that when it clicked for me was when I had a very old container image from 2 years back, and I wondered, oh, I have a completely new machine.
It might not run anymore. And then I realized, no. It will run because the container is self contained. It it has everything it needs. It may it might be an old .net version, but it has all the NuGet packages and everything that it needs all packaged together so it's ready to go. Just fire it up and and it it starts running, and it did. So back to, what do you need to do there? Well, sometimes it's just a matter of taking the application Jim, that you build and you put, application insights in there, and then you package it as a container image. And and it it there's some requirements there to be able to run inside of a container, and there's good practices because a container image, it's fixed.
Hard coded things that you put inside of the container image will not change when you move it from dev to test to acceptance to production or maybe from preproduction to production. Doesn't even matter how many different environments you have. The the the environments change, but the image doesn't. And if the image contains environmental information, it It will break if just imagine if the connection string is in your app settings JSON Yeah. That file will be in your container image and that will work probably in, developments, but if you move to another one, it has the wrong connection stream. Yeah. Yeah. The same goes for, for for application insight. All it needs is a unique identifier, the instrumentation key, and that's something that you Don't want to hard code inside of your image because if you make that environmental, like reading it from an environmental variable and and how that is done in .net is using, the configuration system because you can just say the configuration builder add environment variables and it does so by default.
Then it will read information from the environment, and then your container environment or your hosting environment will only have to say, this is the instrumentation key for this environment, and your application code will read it from the environment instead of having it hard coded in files on completion. And it will it will work in different environments. So you can tweak it from the outside with the environment variables in several ways, and your code is fixed inside of the container image. And that's what we call, immutable, images because they won't change. So you have a guarantee that whatever you're running in one environment is also running in production. So in your case, You know that it's not the code because the code couldn't change from from the image that you started with up to the point where it's in production. If it's the same container image, it it has the exact same code. So it can only be environments changes, the the environment variables, or the data changes, like you said, because usually, that's external.
Yeah. But it doesn't it doesn't take, additional effort if you, packaged it as a container image. And if you're doing .netco core, or .net5, If you have Visual Studio 2019 or Visual Studio Code, then you can use the tooling to add that, the the Dockerfile, which is the description of how the image is built, and and the tooling will take care of the rest and put everything inside of an image.
[00:31:33] Unknown:
Okay. Yeah. Isn't that wonderful nowadays? Just click in Visual Studio and everything happens automatically, and you don't don't have to worry about it. Well Yeah. Ish.
[00:31:43] Unknown:
I think it's good to have tooling accelerate the things you're doing, but not if they're doing magic and you are losing control of what is what's really happening there. And it might be confusing at times. For example, like, we're discussing now, Visual Studio, if you're running a docker or compose, which is a composition of several docker images. It might be a composition of several .net projects in your Visual Studio solution. Then, you think that if you look at the, your all your code that everything is there, but somewhere on a in a special place.
It it builds it generates additional files that will be used for debugging purposes or for running in a in a release mode. And you need to know about those because otherwise, you would think, how can it be that inside of my image when I'm debugging, There are no files with my DLLs, the dot net assemblies, or so they're missing. How can that be? And then, so that's the automatically. Yeah. It it works. But how does it work? Because If I try to run the image without Visual Studio, it says I can't run this. And then it it turns out that what Microsoft does is for debugging container They're images. They build a hollow container. And instead of putting the files in there, they put, like, a, let's say, a a a volume mapping is what it's called. But it's it's they they, look from inside the container onto your file system to your bin debug folder.
And that's where everything is. And so that's it's a very fast way to keep the container image the same. Yeah. If you rebuild, your bin debug Files will change, and the container image will look on your file system to the all new files. And once you understand it, then you are Oh, oh, that that's why I need to build a release image because those assemblies need to be inside of the container if I want to ship it to production. Yeah. So automatically it's good as long as you understand. Otherwise, there might be surprises. Exactly. Oh, okay. I didn't know that one. That's, that's good to know. That's, if you wanna know the at your solution level in the the file explorer, so you go to your solution roots.
And then they decide of the OBJ slash docker, that's where you find all those files. And there you can see that there's g dot YAML files that are generated files that that that do all the things we just discussed, mappings of volumes, additional settings for file watching, well,
[00:34:17] Unknown:
yeah. The the magic happens there. Okay. Pro tip here, everybody that's listening. Write it down. That reminds me almost of, ASPX, days back in the day with with, Remember that ASP dot NET forms also did lots of magic behind the screens. It was terrible stuff. Yeah. It worked, but, you know, how did how did it work? Okay. So, I think we're nearing the end of the episode. Where can people find, more about you so that they can follow your work and maybe, see, where you are, speaking next Because lots of these things are virtual now as well. They might be able to catch one of your talks.
[00:35:02] Unknown:
Yeah. The well, there there used to be a time where I did A lot of blogging, and that that that's that's ages ago. Nowadays, I usually, send my information using Twitter, and my Twitter handle is at Alex. Just, you know, first name, last name, in a room. And that's where I usually I, I will mention things that I think are good, important things to take note of. And also if I'm speaking somewhere, I would, put it up there.
[00:35:36] Unknown:
Okay. Excellent. I will link to your Twitter profile on, in the show notes so that people can find you. Well, thank you very much for, taking the time to teach me about monitoring and instrumentation.
[00:35:50] Unknown:
Yeah. You're welcome. And thanks again, for having me on the on the podcast. I really enjoyed it, Barit. Good stuff. It's absolutely my pleasure.
[00:35:59] Unknown:
And, thank you all for listening, and we'll talk to you next week. Could you please go to rate this podcast.com/developerweekly And rate this podcast and leave a review. This helps me to spread the word about the podcast and helps other people to find it. That is rate this podcast.com/developerweekly. Thank you so much.
This week on developer weekly.
[00:00:05] Unknown:
Would you log that? Well, that maybe those calculations happen thousands of time per hour or per second even. So that that doesn't really make sense to store those in a log. You can never look at them
[00:00:19] Unknown:
Yeah. Hey, guys. I've been using Windows 10 for years now, and I recently took the time to learn how to be more productive with There are lots of shortcuts and tools in Windows 10 that helped me throughout the day. Do you also want to be more productive with Windows 10? Then check out my new Udemy course called Windows 10 Productivity Booster. You can check it out at azureberry.com/windows. That is azureberry.com/windows. Welcome to another episode of Developer Weekly. This week, I'm talking with Alex Thissen about monitoring, logging, and tracing .net applications and Azure stuff in Azure.
Alex is an application development enthusiast Since the late nineties and works as an architect, lead developer, and mentor at large enterprises and small companies. He, also does all sorts of other stuff and speaks a lot at conferences. Welcome, Alex.
[00:01:16] Unknown:
Hi, Barry.
[00:01:17] Unknown:
Hi. Hey, Figgs. To be here. Yeah. Good to have you on the show. It's, yeah. I had you on the list for a long time, actually. I thought this is the guy because, back when I had, helped out with the user group .net site, we had you on as a speaker talking about, containers. You remember that? It's quite a while back. Yeah. It was a very good talk, was that. So I I enjoyed that. Yeah. I always enjoy your stuff. You you're You can explain technology concepts very well, and that's what we need in this world.
[00:01:50] Unknown:
Okay. Well, the what that that might be thanks to a couple of years of training that I did where I focus on teaching developers and architects to, to learn all the new stuff on the Microsoft platform.
[00:02:03] Unknown:
Yeah. Yeah. Because there's always new stuff to learn. And, even if it's not new, there's so much stuff nowadays that there's you know, you can't learn everything. So, today, I wanted to talk about, something very important like monitoring, logging, and tracing. And that basically comes down to, You know, as a developer or software person, whatever you do, even if you're a DevOps person, you spend a lot of time, looking at stuff that goes wrong In your applications, right? As a developer, usually I try to make stuff. Like, let's say it's a website or something.
And I don't know. Maybe 70% of the time I deal with, oh, why is it not working as I want it to work? And that's even in development time. And that becomes worse when, things don't work as expected when they are actually running in production. Because there, you know, I'm not debugging. I'm not, I'm not I I don't I'm not on that machine, basically. So it's it's more difficult. So Yeah. Yeah. As a software person, we spend a lot of time in that world of troubleshooting and dealing with things that go wrong. And I guess, to make that a bit easier and a bit better and to make sure that we have we don't have no reproduction bugs, so things that we Cannot reproduce.
We need to make sure that we have monitoring, logging and, tracing. So These, these concepts, maybe we can, start by defining what that actually is. Because like the, we have 3 terms here, monitoring, logging, and tracing. What are those, and what is the difference between those? Can you explain that?
[00:03:48] Unknown:
Oh, yeah. Sure. Well, I I do feel your pain that you're using. You're like, oh, it worked on my machine and now it's running somewhere else and it doesn't work anymore, but I don't have the luxury of of debugging it in a in a development environment and and you have no clue. And then sometimes you resort to the poor man's debugging with sprinkling all these log messages here there just to get a bit of an understanding what is going on because I'm in the blinds on the other side. Yeah. Well, I think that that, in overall that I would call that instrumentation. And I I like to think of a plane where You want to have all these, altitudes, air pressure, velocity, altitude, those kind of things. You want to know that.
Because otherwise, if you don't, you are in the blind and you don't know where you're going and where you are, and that's not a not a good space to be in. Well, so what what do you well, what can you do in your applications to get bit more of an understanding what is happening inside? And you mentioned the logging, then there's the tracing. And I also would like to think about, metrics as well. Ah. And with those, you can start monitoring during your system because it's instrumented with those 3 aspects. And I think they're kind of different.
To start with the logging because we're most familiar with that one. That's, sort of like a record keeping. It's a log with a list of things that have happened and that were noteworthy, because you want to look back at them later on. There was something significant and you have it logged, so you could say, okay. At this time, this happens. And that that gives you an understanding of important things that happen in your application. Important enough to keep those locked around for a long time, sometimes even a couple of years. The tracing is is, well, it's almost similar, but tracing allows you to, see certain points in your codes. And it it sort of resembles logging, but logging is send out to a log provider or a log source where it's stored, and those can be multiples.
With tracing, what you normally do is you You, have this snippets of information that you send out just to give a bit of a sign of life with some contextual information. And by stitching those together, you can see, you can trace, or you can follow the flow of your execution in the your program. And it serves a different purpose because, tracing is there to help you understand why things are going wrong. For example, in case of Tracing, you could say, you have a, calculate, mortgage function. And then inside, you could trace the the input variables just so you can see what what is actually Right. What what did you receive there?
Would you log that? Well, that maybe those calculations happen, thousands of time per hour or per second even. So that that doesn't really make sense to store those in a log or you would never look at them again. But during a debug session or, you know, troubleshooting, You would want to know, okay, I'm turning on. That's a significant thing as well. You would turn on tracing, and then you can start listening to the trace information. And it's high volume. There there's a lot of trace information. You would get overwhelmed with it. But you only do that in certain scenarios. So you turn it on, You start listening, and you see all those things flashing by. And then you're like, this is what is actually going on. And then you turn it off, and you have your repo without actually having to do, a debugging.
That's that's the the the place for tracing. And it allows you to flow, the follow the flow of your application. And if you do distributed tracing, you can even see that across network hops from 1 service to the other because there's a, well, there's a w three c, formatted tracing, and each participant in the, the distributed, call stack. If they adopt such a, such a standard as w three three c tracing, Then you can, correlate them all together, and you can even see, okay, we're going from the websites and these kind of things happened, and then we jump to a web API high and the other things happened, and they would jump back to another one. And even if you fan out across multiple services, you could see everything, as a form of telemetry stitched together, but it's also distributed tracing.
And then you have the the and the metrics are numeral information that you send out. And, usually, it's about performance of your application in the broadest sense. It could be, like, the the Windows performance counters that those are metrics as well or Linux metrics on memory consumption, Disc reads, well, the the network traffic that is happening, those are, machine metrics, but your application could also, send out metrics. For example, the number of conversions, from a, shopping basket on your web shop when there has somebody actually bought something or added something to it. You can you can see these counters that say, okay. The numbers of items being added to shopping basket at this point in time. And if if it's a Black Friday, you would probably see that spike.
And you see, oh, something really good is happening. But, also, if you all of a sudden see it dropping to 0, Yeah. That's probably not a good sign on Black Friday. So you can also see when things are going wrong, and that's the instrumentation where the plane would say, okay, you're on a collision course with a mountain. Maybe you should steer left or right or go up. This is where you could see What how is my my application doing? And that's that's maybe the 4th part. Some some people like to combine the the the the deep knowledge of your system, and, formulated in in terms of health endpoints.
And then what you do is instead of monitoring on the outside and looking at the metrics and saying, okay. If this happens, it's it's not a good sign. You can have the application itself indicating, I'm not doing well, so I'm unhealthy or I'm degraded. An example, if an application is reading from a certain queue and the the queue size is building up, That could be an indication that, the service is being overwhelmed by all the incoming messages on the queue. So we could say, I'm I'm unhealthy because I can't keep up. I the so the the system is getting unhealthy. And that allows you to, know before things are really going wrong that they are about to go wrong.
Yeah. Yeah. And with all of these combined, you have very rich patient, record keeping for for long term periods following the flow of replication, seeing the, the numerical things to, that that are happening. So you can see averages, trend lines that that's good for graphs. Spring is about, your system itself indicating if it's a thumbs up or halfway or thumbs down. And that that allows you to do things like, the monitoring that you mentioned, then we can start monitoring. Because if we build a dashboard with particular indicators that say, this is good. This is not so good. These are the, the the the graphs.
Then, you can look into your system, so to speak, even though you're watching on the outside. It's like it's like a doctor that that that would, do all these wires on your heart for to see your heart pulses and that measure the the the oxygen in your blood or your blood pressure. It's kind of like that. We're on the outside, but we can tell a lot about your health.
[00:11:51] Unknown:
Yeah. Yeah. Ah, okay. So that's the difference between those things. So, I remember that, there was an application once. It wasn't that that great of an application. It was a, what do you call it? A brownfield application, so an existing application I needed to work on. And they had quite some performance problems. And we really didn't know where they came from, and they only happened on production as well. So what we did was we, built in these, log messages. Like, hey, I'm now at this, and this is the timestamp. And, hey, now I'm at this method and, 2nd has passed. And now I'm here and this took, this long. And then try to log out Those things in in production, which was very, very painful.
So For a problem like that that happens in production, is there a better way for that? What what would you use for that?
[00:12:48] Unknown:
Okay. Well, the the, what what you did was using the, the the log messages. Like I said, the poor man's debugging. Right? It's it's like watching, trying to figure out what the movie is by having these, screenshots of of the Yeah. Of every, every now and again, the light flashes on, and then you can see what is there. And then the light goes over. You're in the line, and then it it there, and you have to make, well, mental exercises to know to think what happened inside based on the code, and it's it's a lot of guessing. The the the tracing might help you there because it gives you much richer information even, if it's not being logged, and it's also better for performance.
Well, couple of things that you can do in that case. If if things go bad in production only, so it's not reproducible in a similar environment. Then I would say you could do a couple of things. First of all, you can, do snapshot debugging, which, instructs if that that is if you're using, the the dot net core stack or the the the dot net stack in general. Yeah. You can instruct your application to, give a snapshot and that you can load inside your visual studio debugger, and you can debug at that point in time. So it's, like, taking a small part of the movie and being able to go back and forth there and inspect variables and see what is happening.
Yeah. A more rude, rude version of that is doing a, a sort of a memory dump where you build a a dump image and that you can also load with tools that, that are called, SOS with windDBG is the, the the tool surrounding it. And this as OS, that's a sign of strike. It used to be an internal tool strike by Microsoft and they build a public version, some of strike. So you will find a lot of these SOS things, that that allow you to, open up a, a memory dump and inspect everything inside from managed to unmanaged codes. And then where there's a lot of tutorials, there's 1 I have to think of the name of the the person that did it.
Let me think. I'll get back to that later, but it it's, a woman from Norwegian. She's a support engineer at Microsoft, and she has a very nice log with tutorials on how to use the tooling and how you can inspect the the stack, the heap, all the threads to do, look at, for example, deadlocks to look at, places where you have, memory leaks, objects that are locked and can't be, can't be disposed. A lot of things that you can get out of that tool. And, there's commercial tools as well that go even beyond that. But those those allow you to, get a lot of information. For example, if you have unexpected crashes, Then then you can do a, a memory dump on the crash. You can instruct the machine to do so.
Load the, the the memory dump, use the tooling, and start fiddling around. And with a little bit of practice, You will be able to do a sort of a post mortem analysis which can help. But the snapshot debugger is there, as an alternative. Yeah. And and that might help you in a situation where you say, I I have no I have no repro bug, other than on production.
[00:16:32] Unknown:
Yeah. Snapshot debuggers, really good tip there. I did it, in a different way, once Where I actually attach the debugger to a production process. That's not a very good idea because then when you hit a Breakpoint, the whole website stops.
[00:16:49] Unknown:
Yeah. Yeah. You The process stops. From processing and, the just run the single one. Now that's that's one side effect. And probably if you want to do some some useful debugging, you might have had to put a So you're not doing a release build, but a debug build on production. Well, if if push comes to shove, that might be the thing you have to do. And then But did you find the bug eventually? Yeah. Yeah. Did find the bug, yeah, and solved it. Alright. Okay. Would it have been something that you could have fixed, in in another way in retrospect.
[00:17:29] Unknown:
Did you find that, if I only had done this, I would have found the bug earlier and on a production only? No. It was really something that only happened with production data. So what could have helped is If if we would have had, environments that would actually contain parts of the production data. So the the the problem there was, And this happens a lot. I see at least, in in my consulting is that, the environment, sort of development, tests, acceptance, production, or whatever environments you have Usually are very, very different in infrastructure. So the development environment often just has a little bit of data that the developers themselves just put in there In a database, let's say, that you test stuff out. And then usually things seem to work fine, but then come in production where you have all sorts weird data. You just put in lots of weird stuff or maybe a lot of data. Things sometimes work differently than, than expected.
-So I think it's good practice to also take the data. So not only the things where your services run and make those the same in those environments, at least As similar as you can do, but also the data itself. Try to make snapshots of the database of production -And then use those in acceptance and tests as well. And potentially, you should mask some, some production data there. Let's say if you have credit card information or other stuff, Personal information that should only be in production and, that developers and other people shouldn't be able to read. So, Okay.
So these processes that you've just talked about, instrumenting your application, that sounds a bit painful, actually. I think, Let's say we're we're running our application in Azure. Is there, another very easy way maybe to to instrument and monitor our application when we do that.
[00:19:22] Unknown:
That's a good one. Let me think. Well well, in And, in Azure, you get a lot of, internal information from the resources themselves. It's it's more at the, the control plane where things happen that that, allow you to, inspect your application at another level. That that's, that's one thing, I would I would say. The other thing is that in, in Azure, there are, Azure resources that will help you out with this. For example, if you have application insights, It, there there's only a little bit you have to do to add instrumentation, from application insights to your application parts.
And then you will find that there's, a very nice, sort collaboration between your code and the Azure resources. As an example, if you host things on, an an Azure web app, calling into an Azure web API also as an Azure or maybe in and inside of an AKS cluster and then make a call to Azure SQL or some other data storage resource. You will find that if you use application insights and with all the correlation information it's being sent across that you can get really rich application maps that allow you to inspect the the the way your application is hooked together and how calls are going from one place to the other. It's sort of a, well, yeah, getting a visual imprint of all the calls going back and forth with measurement of time, but also being able to pinpoint the places where the the failures are happening because the application inside, if you go to the portal, It will, surface those errors, and you can drill down into them. And you can see the, all the the the traces, so things calling things.
And and then all of a sudden here it breaks and then it it usually escalates that that the all the all the other stuff is broken as well, but you can pinpoint it to the root cause and then, inspect. And usually, there's even the the exception information that's also being logged. And that that comes because that that that works because Azure is built that way. It It uses that internally as well. It sends out all this information to, Azure log analytics. That's that's to back end storage and the Azure monitor can look at the the log analytics.
And it's it's also the, the .net runtime, the .net core runtime also using all these metrics and sending out information about, its health, the the number of requests, the requests being throttled, and that that's also already there for you, and you don't have to do anything about it too. So is it painful? Well, you get a lot for free if you have a, a Microsoft, homogeneous stack. Then then it will it will usually work very well together. If you make trips to other things outside, you might have to do a little bit more work there. But I still think it's a good practice that if you're doing, developments, for for let's say, cloud applications, and you want to do more of the DevOps style of working then it but you you build it and you need to run it as well. And I think that as a team, you want to have all these things in place.
So I consider it a best practice to put that in. Otherwise, it's not ready to be run into production because How will you be able to get a feedback loop? And it's not just how is it doing, but sometimes it's even how is it being used by your customers because we build this new feature. And as it turns out, 2 months down the line, nobody actually used it because we can see it from the usage metrics that nobody was using the advanced search that the business had the customers wanted to have. And then you you can have, or you can see that there's there's a lot of use for it. And then you can say, ah, good. We are going to re reprioritize our backlog because this is a good thing. We need to invest more in this area.
[00:23:42] Unknown:
Yeah. I couldn't agree more with that. So, yeah, Application insights gives you a lot for free. What I like there a lot is that, You know, you can enable that for your application in the Azure portal where you say, right, I want application insights. And then it will add an extension, let's say, to an Azure app service So that it will monitor, everything that's going on in the infrastructure. And then in your code, if you have Visual Studio, for instance, and you are in the dot net, ecosystem. Then you can also simply there say, well, add Application Insights to my application, to my .net or .netcore application. And then it adds those packages and a couple of lines of code, and then it just automatically basically works, which is great.
-Yeah. And to your point there, yeah, I totally agree. You know, a feature really isn't done until It is used as expected. So, you know, it's not only that that there's no bugs, even if there's no bugs. It could be that users aren't using it, like you say. It could be that users, don't seem to be able to use it as they want to. Maybe, you know, maybe you're you designed it weird, and they just can't find the button that they need to click on. And and you notice in your monitoring and tracing that they keep clicking on something else Or on the back button or something, and that that's weird around there. So it it should be in the definition of done that you To, that it works as expected.
And if if it isn't, then, like you say, you have that feedback loop, and you can take that back to product management or whoever Who then puts it back on the backlog or prioritizes it? Yeah. That's awesome.
[00:25:23] Unknown:
It it it it's it's the same, I think, with, if you're doing DevOps. People like to say we're doing biz DevOps, but I think you should always do DevOps together with the business because you're doing it for the business. It should be co creation. Some people say we do biz, DevSecOps because we need security in there. I'm like, how can you build something that doesn't do dirty upfront. And then I think the same goes, for the the the the the monitoring capabilities. How can you do ops if you don't have monitoring? And so it it it all boils down to, your definition of done like you said that that it should be there.
It should be secure, compliant, instrumented.
[00:26:07] Unknown:
Yeah. Exactly. So, we've just talked about application insights, which is a great way to monitor stuff that runs in, let's say, app servers and other services within Azure. Now, you know, I like Azure. You also like to run stuff in containers. Right? Like, let's say Kubernetes or something. How does that work? How can you instrument and monitor stuff there in in a similarly easy way as with Application Insights? Does Application Insights work for containers, or do you use something else?
[00:26:41] Unknown:
That that would that would work, and it will the the the the containers are, I think, once you discover the the the the power of containers, but also the ease of use, then, Well, I think you you'll see that it's hard to think of a, a new solution not using, containers. But it's it's not as, one size fits all or a solve everything, piece of technology. But I can I can illustrate a couple of things there? First of all, the what what makes containers great is it's that it's, it's sort of a application virtualization. It puts everything you need to run an application inside of a, well defined set of files. That's the container image, where if you have a machine that is able to, to host a container instance and it has the right kernel underneath. A Linux container image needs to run a Linux kernel and a Windows container image needs to run on Windows, then you're good to go. If the kernel is compatible with the, the assumption when the, container image was built, It will be it it will run, and it will run because everything is there. And then to illustrate is that when it clicked for me was when I had a very old container image from 2 years back, and I wondered, oh, I have a completely new machine.
It might not run anymore. And then I realized, no. It will run because the container is self contained. It it has everything it needs. It may it might be an old .net version, but it has all the NuGet packages and everything that it needs all packaged together so it's ready to go. Just fire it up and and it it starts running, and it did. So back to, what do you need to do there? Well, sometimes it's just a matter of taking the application Jim, that you build and you put, application insights in there, and then you package it as a container image. And and it it there's some requirements there to be able to run inside of a container, and there's good practices because a container image, it's fixed.
Hard coded things that you put inside of the container image will not change when you move it from dev to test to acceptance to production or maybe from preproduction to production. Doesn't even matter how many different environments you have. The the the environments change, but the image doesn't. And if the image contains environmental information, it It will break if just imagine if the connection string is in your app settings JSON Yeah. That file will be in your container image and that will work probably in, developments, but if you move to another one, it has the wrong connection stream. Yeah. Yeah. The same goes for, for for application insight. All it needs is a unique identifier, the instrumentation key, and that's something that you Don't want to hard code inside of your image because if you make that environmental, like reading it from an environmental variable and and how that is done in .net is using, the configuration system because you can just say the configuration builder add environment variables and it does so by default.
Then it will read information from the environment, and then your container environment or your hosting environment will only have to say, this is the instrumentation key for this environment, and your application code will read it from the environment instead of having it hard coded in files on completion. And it will it will work in different environments. So you can tweak it from the outside with the environment variables in several ways, and your code is fixed inside of the container image. And that's what we call, immutable, images because they won't change. So you have a guarantee that whatever you're running in one environment is also running in production. So in your case, You know that it's not the code because the code couldn't change from from the image that you started with up to the point where it's in production. If it's the same container image, it it has the exact same code. So it can only be environments changes, the the environment variables, or the data changes, like you said, because usually, that's external.
Yeah. But it doesn't it doesn't take, additional effort if you, packaged it as a container image. And if you're doing .netco core, or .net5, If you have Visual Studio 2019 or Visual Studio Code, then you can use the tooling to add that, the the Dockerfile, which is the description of how the image is built, and and the tooling will take care of the rest and put everything inside of an image.
[00:31:33] Unknown:
Okay. Yeah. Isn't that wonderful nowadays? Just click in Visual Studio and everything happens automatically, and you don't don't have to worry about it. Well Yeah. Ish.
[00:31:43] Unknown:
I think it's good to have tooling accelerate the things you're doing, but not if they're doing magic and you are losing control of what is what's really happening there. And it might be confusing at times. For example, like, we're discussing now, Visual Studio, if you're running a docker or compose, which is a composition of several docker images. It might be a composition of several .net projects in your Visual Studio solution. Then, you think that if you look at the, your all your code that everything is there, but somewhere on a in a special place.
It it builds it generates additional files that will be used for debugging purposes or for running in a in a release mode. And you need to know about those because otherwise, you would think, how can it be that inside of my image when I'm debugging, There are no files with my DLLs, the dot net assemblies, or so they're missing. How can that be? And then, so that's the automatically. Yeah. It it works. But how does it work? Because If I try to run the image without Visual Studio, it says I can't run this. And then it it turns out that what Microsoft does is for debugging container They're images. They build a hollow container. And instead of putting the files in there, they put, like, a, let's say, a a a volume mapping is what it's called. But it's it's they they, look from inside the container onto your file system to your bin debug folder.
And that's where everything is. And so that's it's a very fast way to keep the container image the same. Yeah. If you rebuild, your bin debug Files will change, and the container image will look on your file system to the all new files. And once you understand it, then you are Oh, oh, that that's why I need to build a release image because those assemblies need to be inside of the container if I want to ship it to production. Yeah. So automatically it's good as long as you understand. Otherwise, there might be surprises. Exactly. Oh, okay. I didn't know that one. That's, that's good to know. That's, if you wanna know the at your solution level in the the file explorer, so you go to your solution roots.
And then they decide of the OBJ slash docker, that's where you find all those files. And there you can see that there's g dot YAML files that are generated files that that that do all the things we just discussed, mappings of volumes, additional settings for file watching, well,
[00:34:17] Unknown:
yeah. The the magic happens there. Okay. Pro tip here, everybody that's listening. Write it down. That reminds me almost of, ASPX, days back in the day with with, Remember that ASP dot NET forms also did lots of magic behind the screens. It was terrible stuff. Yeah. It worked, but, you know, how did how did it work? Okay. So, I think we're nearing the end of the episode. Where can people find, more about you so that they can follow your work and maybe, see, where you are, speaking next Because lots of these things are virtual now as well. They might be able to catch one of your talks.
[00:35:02] Unknown:
Yeah. The well, there there used to be a time where I did A lot of blogging, and that that that's that's ages ago. Nowadays, I usually, send my information using Twitter, and my Twitter handle is at Alex. Just, you know, first name, last name, in a room. And that's where I usually I, I will mention things that I think are good, important things to take note of. And also if I'm speaking somewhere, I would, put it up there.
[00:35:36] Unknown:
Okay. Excellent. I will link to your Twitter profile on, in the show notes so that people can find you. Well, thank you very much for, taking the time to teach me about monitoring and instrumentation.
[00:35:50] Unknown:
Yeah. You're welcome. And thanks again, for having me on the on the podcast. I really enjoyed it, Barit. Good stuff. It's absolutely my pleasure.
[00:35:59] Unknown:
And, thank you all for listening, and we'll talk to you next week. Could you please go to rate this podcast.com/developerweekly And rate this podcast and leave a review. This helps me to spread the word about the podcast and helps other people to find it. That is rate this podcast.com/developerweekly. Thank you so much.