Hackviking He killed Chuck Norris, he ruled dancing so he took up a new hobby…


Google Compute Engine: Monitor disk usage with Stackdriver

Setting up monitoring of your cloud servers are not only useful to get alerts when a server goes down or run out of disk. It's also very good to have historical data over performance metrics when troubleshooting issues. In this post I cover the basic setup and issues you can run into using Stackdriver for monitoring your Google Compute Engine servers. There are several Stackdriver features already plugged into the Google Cloud, we will focus on monitoring. Keeping track of our virtual infrastructure and use. Out of the box, just by enabling monitoring in the Google Cloud console, it will collect 5 basic metrics for all your instances.

  • CPU Usage
  • Disk Read I/O
  • Disk Write I/O
  • Network Inbound Traffic
  • Network Outbound Traffic

To get to a basic monitoring level we need at least memory and disk usage. Then we can start to look into more metrics in regards to applications and their performance.

Monitoring agent

To be able to collect these metrics we need to install the Stackdriver agent on the servers. Google have install instructions for the agent in their documentation where you can get a basic understanding of the whole setup process. It's actually very straight forward as long as you didn't change any of the "Cloud API access scopes" when you created the VM. If so make sure that your VM have at least "Write only" for the "Cloud Monitor API". Then you can just download the agent from the link in the install instructions and you will see additional metrics come in. The additional metrics are:

  • Memory usage
  • Open TCP connections
  • Page File Usage
  • Volume Usage (disk usage)

Issues with Volume Usage (disk usage)

So far I have installed the agent on 20 windows machines and all of them report all additional metrics except for the the disk usage.  In high through put and data intensive solutions this is one of the most important metrics. After ours of trouble shooting and browsing of documentation and forums I realized that I'm not the only one having the problem but no one had a good solution for it.  I then noticed that the download link for windows in the install instructions was named stackdriverInstaller-GCM-17.exe while the latest I could find directly from Stackdriver was stackdriverInstaller-39.exe. This leads me to believe that the versioned linked from the install instructions are outdated.

The GCM branded one is an automatic install, no input needed. The one downloaded from Stackdriver needs the API key to install. I couldn't find a good download page on the Stackdriver homepage but after Googling found a Support Center entry linking to their repo. At the same time this entry is from April 21st 2015 it seems to be outdated. I did however try different version numbers on the download link and 39 seems to be the latest one. Anyhow it's much never then 17 at least, but as stated in the Google install instructions their is no way to check the current version of the Stackdriver agent currently installed on windows.

Enough about that! This install requires you to input your Stackdriver API key. If you open up the Stackdriver web-ui via the link in the Google Cloud Console and go under "Account settings" and "Agent" you will find it there. Account settings are found under the project dropdown next to the Stackdriver logo in the top left corner of the UI. Just copy the "Key" and past it into the install wizard.

Dashboards and filtering

Now you can create dashboards with different metric charts to get a good overview of your system. The charts can be filtered on resource name via regex. So far I have not been able to filter out specific drive letters. In the view, as well as the underlying JSON, the data is in the format of {instance name}(C:) for example. So a regex like ^.*\(C:\) should match all C:\ drives but it doesn't work. It's not a big issue but there is a few improvements that I hope will come shortly. We have to remember that at this point Stackdriver functionality comes to Google Cloud as beta and does not have any SLA at all.