Ashish Sheth's Blog: 2021

When your webserver goes down, you need find out why the webserver went down and fix the issue.

But even before you troubleshoot and fix the issue, you need to make sure your webserver is available to your users so they can continue using your website.

There are lot of sophisticated solutions are there, such as failing over to a redundant web server, or creating new instance using AWS AutoScaling. However this solutions comes with their own cost and complexity and if you are small shop like us you may not be able to afford it. And sometimes the simplest solution is to restart the webserver, if you don't have a failover server or AutoScaling configured for your server.

If your infrastructure is on AWS, you can use couple of AWS services to restart the webserver automatically without any manual intervention.

You need to use AWS CloudWatch Logs, CloudWatch Metrics, CloudWatch Alarm and AWS SystemManager RunCommand. Note that if you are using some Metrics published directly by AWS services(Such as EC2 CPU Utilization) than use of CloudWatch Logs is not necessary. You can directly skip to step 2 below.

Here are the steps:

1. Make sure that your web application logs are published to CloudWatch Logs. For this you need to install CloudWatch Logs Agent on the EC2 Instance and configure it as described here.

2. Configure Metrics based on the some entries of Logs published as per the step 1 as described here.

3. Configure a CloudWatch Alarm based on the metric filter created above as described here. When if ask for selecting a metric, select the metric created above.

4. During the creation of the Alarm, select AWS RunCommand as the Action. Provide the following shell command as command to execute.

service apache2 restart

Or for that matter, any command you want to execute.

Rsync Algorithm: TR-CS-96-05.dvi (cmu.edu)

RSync is a utility which is used to transfer files or folders from computer to another in unix based systems. The key advantage of Rsync is once the full data is transferred to destination, only the changed bytes are transferred next time onwards, saving time and network bandwidth.

The above mentioned paper describes the algorithm used to find the changed bytes and how to file data is transferred from source and recreated at destination.

Here is the summary.

Instead of transferring complete file, divide the file into chunks of bytes and calculate its hashes. First time all the bytes are transferred to the destination and then for each chunk of bytes a weak rolling checksum (inspired by Adler-32 Checksum) and strong checksum of each block of bytes is calculated and shared with the source.

At source, the combination of rolling hash and strong hash are used to find the indexes of chunks of bytes that are same as some block of bytes destination. This way the bytes are different in source also can be found. Then source sends to destination those bytes that are different, the index of the previous block of bytes which were matching at both source and destination and the index of the current chunk of bytes before which the new data has to be inserted. At destination the file is recreated using the received bytes.

Only the chunks which are changed are transferred in this scheme.

Ashish Sheth's Blog

Monitoring and Restarting Apache webserver Automatically using AWS CloudWatch

Summary of Paper: RSync Algorithm