RationalSpace

Archive for the ‘OpenSource Tech’ Category

Scaling web applications

leave a comment »

As a web application starts becoming a hit with increase in traffic, increase in user base etc, often the biggest challenge that it faces is scaling.  How to ensure that all the features of your website work as good as ever even when there many more requests per second coming to your server.  Different scaling techniques are applied at different stages of a website growth – from something like 100K sessions per month to billions of sessions per month.

Before we delve deeper into how to scale, one thing that we need to understand is that performance != scalability. Performance and scalability are two very different things. Performance is more about how fast a request can be executed or how optimum is the use of resources. Whereas, scalability is the ability of the architecture or the system to handle a large number of requests in an efficient way.

So a website can be analysed by 2 kinds of variables – the ones that we want to be high like performance, scalability, responsiveness, availability and the ones that we want to be low – downtime, cost, maintenance, SPOF ( single point of failure). We have to keep in mind these variables while we are designing architecture of scalable web applications.

There are several methods or architectural designs by which we can scale web applications.

The first one and a very common one is Vertical Scaling. This type of scaling is also called “Scaling up”.Vertical scaling basically means that you add more hardware without adding more number of nodes. So if your current server has 4GB RAM and dual core, you extend it to 8GB RAM and quad-core. The advantage of vertical scaling is that it is easy to do. You don’t really need software skills to do this. On the other hand, the disadvantage is that the cost increases exponentially. Also , in case of vertical scaling we cannot handle the issue of SPOF. If the server goes down, the application dies too – which can lead to situations where users have to face significant downtime.

The second way to scale is Vertical Partitioning. This basically involves partitioning your application is such a way that different components or software layers are put on different servers and each of that server is optimised to handle that particular component or layer. For example, web servers like Apache or Tomcat typically require more CPU as they need to handle more TCP/IP connections. Also, database servers like MySQL require more RAM as it loads a lot of tables and queries in memory. So it makes sense to put them on different nodes.The advantage of Vertical Partitioning is that we can optimise the server according to the application requirement. Also we need not change anything in the application as such. However, the disadvantage is that in some situations it might lead to sub-optimal use of resources. For example, on the node that has the database hosted, CPU may remain idle most of the time. Also, in this kind of architecture all nodes are heterogenous. So the maintenance is a bit complicated. Nevertheless, in most situations vertical partitioning is a good way to scale websites and it works pretty well.

The third and another way to scale is Horizontal Scaling. In this approach, one would simply add more nodes to the system with the same copy of the application server. This type of scaling is also called “scaling out“. So you basically add a load balancer in front of your nodes and route the traffic. Load balancers could be hardware or software based. A very popular open source load balancer software is HAProxy. So as your traffic increases, you increase the number of nodes behind the load balancer. Since all the nodes are homogeneous in this case, it is simpler to scale.  One problem that needs to be addressed when we are designing horizontal scaling systems is Sessions. Now once we have a load balancer in front of our nodes, it can so happen that one request of a person goes to one node and the subsequent one goes to the other. If this happens, the user will suddenly feel lost as the application will show him a login page again or in case of an e-commerce application, all the cart data may suddenly vanish.There are several ways to handle handle this – the most common being “Sticky Sessions” – Sticky sessions imply that the first request and all further requests of the same user will go to the same server. This works in most cases though it has a slight disadvantage of asymmetric load balancing – which means it can happen that all requests of a particular user or user group going to a node may load that node heavily. There are ways to handle that by implementing Central session storage or cluster session management.

So as you might have observed, with the progress of each technique discussed here, the level of complexity is also increasing.

Another way to scale is “Horizontal Partitioning” of database. Since database is often observed as the bottleneck of web app, it makes sense to divide the db into multiple servers. Basically in this technique we divide the tables horizontally – The rows are divided across nodes based on algorithms like FCF (first come first), Round Robin or hashing etc. The flip side of implementing horizontal partitioning is that it needs code change and is complex to maintain – you need to aggregate data across clusters. Also, if we have a change in global setting, it needs to be replicated across nodes.

So here was my attempt to discuss the various techniques of scaling web application. Hope it helps:)

Advertisements

Written by rationalspace

February 27, 2015 at 2:32 pm

Using JSONP to make requests across domains

leave a comment »

Recently, I came across a requirement wherein we wanted to partner with other websites by giving them widgets of our stock charts. A widget is basically a tool or a piece of information that is useful and mostly just plug and play. Plug and play implies that one could take a small piece of code and embed in his blog/website and the information/widget will start showing up. In our case we wanted to make our charts available to anyone who would like to take it and embed in his/her blog.

This is pretty simple to do if it is some static piece of information like an image or a media file that needs to be shown on the other website. All you need to do is make an http request to your domain from the blogger’s domain. But we wanted our charts to be dynamic. Data should get updated each time a request is made. Now, JSON has been pretty much there for making any type of calls between client and server.

JSON

JSON (Javascript Object Notation) is a convenient way to transport data between applications, especially when the destination is a Javascript application.

JQuery has functions that make Ajax/HTTPD calls from a script to a server very easy and $.getJSON() is a great shorthand function for fetching a server response in JSON. But this simple approach fails if the page making the ajax call is in a different domain from the server. The Same Origin Policy prohibits these cross-domain calls in some browsers as a security measure.

But what about data transfer across domains?

A standard workaround is to make use of Cross Origin Resource Sharing (CORS) which is now implemented by most modern browsers. Yet many developers find this a heavyweight and somewhat pedantic approach. Also, you cannot possibly ask every blogger to first inform you about his domain, then you make an entry in the CORS file and only then would the widget start working for him. Pretty cumbersome, isn’t it?

So what is the way out ? 

JSONP

JSONP (first documented by Bob Ippolito in 2005) is a simple and effective alternative that makes use of the ability of script tags to fetch content from any server.

This is how it works: A script tag has a src attribute which can be set to any resource path, such as a URL, and need not return a JavaScript file. Using this I can fetch data from another server and make the Javascript draw a widget using it.

Here is an example:

Client Side:
$.ajax({
type : 'GET',
url : siteurl+'json/getSomeData.php',
data : {tick:(element.data('id')).toUpperCase(),callback:'callback_fun'},
dataType : 'jsonp',
cache : true,
crossDomain : true,
success : function(data){
}
});


function callback_fun(){
//do something; update your widget etc
}

Server:  PHP script example:

header("Content-Type: application/javascript");
$some_data = db_call(); //some db call
echo (isset($_GET['callback']) ? $_GET['callback'] : '').'('.json_encode($some_data).')';

Note the brackets enclosing in php script echo. Its pretty important.  Also, the callback is important. Won’t work without that.

See a demo here.

Written by rationalspace

February 9, 2015 at 4:20 pm

Performance techniques for responsive web design

leave a comment »

There is no doubt in the fact that mobile usage is sky-rocketing in the last few years. Also mobile is the only way by which a lot of people are accessing internet – especially in Asia and Africa.  With such compelling evidence, it is essential that a website should focus on becoming mobile friendly. In fact, it is not just enough to be mobile friendly but also equally important that a website loads fast in a mobile device.

So why does performance need special attention in mobile?

Before a mobile device can transmit or receive data, it has to establish a radio channel with the network. This can take several seconds. The best part is, if there is no data transmitted or received, after a timeout the channel will go idle, which requires a new channel to be established. This can obviously cause huge issues for your web page load times.

On a typical United States desktop using WiFi, a request’s average round trip takes 50 milliseconds. On a mobile network, it’s over 300 milliseconds. This is as slow as old dial-up connections. Additionally, even WiFi is slower on handsets, thanks to antenna length and output power. This means you really need to prioritize performance as you optimize your site’s design for mobile devices.

Techniques to improve performance of a responsive website

Over the past few months, conversations about Responsive Web design have shifted from issues of layout to performance. That is, how can responsive sites load quickly -even on constrained mobile networks. So what can be done? Here comes the talk a new set of techniques called RESS –  Responsive Web Design + Server Side Components.

So here is a list of things that can help :

Send smaller images to devices

The average weight of a webpage today is 1.5MB and 77% of that is just images! So if we optimise images, it will help significantly to improve performance. Now how can we send smaller images to the mobile devices? This could be done by an older approach wherein you need to maintain different image sizes on the server and then depending on the screen size send the appropriate one.

Detect client window size and set a cookie

<script type='text/javascript'>
function saveCookie(cookiename,cookieval){
//write cookie
}
saveCookie("RESS",window.innerWidth);
</script>

Server Side Code to read size and deliver images

<?php
$screenWidth=$_COOKIE["RESS"];
if($screenWidth=="320"){
$imgSize = "300";
}else if($screenWidth=="500"){
$imgSize = "480";
} ///and so on
echo "<img src="&lt;path of file&gt;_$imgSize.png" alt="" />

?>

So what’s the new way ? With tools like “Adaptive Images“, this is made much easier!  Adaptive Images detects your visitor’s screen size and automatically creates, caches, and delivers device appropriate re-scaled versions of your web page’s embeded HTML images. No mark-up changes needed.

Conditional Loading

Another technique that will help improve performance is conditional loading – You detect on server side the kind of device the user is on—screen size, touch capabilities, etc.—and load only the content that is necessary for that user to see. From social widgets (like google, fb, twitter etc sharing)  to maps to lightboxes, conditional loading can be used to ensure that small screen users don’t download a whole bunch of stuff they can’t use.

I found a good script on github that helps in server side detection – https://github.com/serbanghita/Mobile-Detect

Feature detection

Don’t load the features that won’t make sense on mobile. A simple example could be inserting a video link. Detect the browser and insert video link only where it works. Else show simple text.

A great tool for finding your user’s browser capabilities is Modernizr. However, you can only access its API on the browser itself, which means you can’t easily benefit from knowing about browser capabilities in your server logic. Now this can help to tweak thinks on client side and change appearance etc, but sometimes its better to send the correct markup from the server side itself. The modernizr-server library is a way to bring Modernizr browser data to your server scripting environment. For example you can detect if the browser has things like canvas, canvastext, geolocation etc.
<?php
include('modernizr-server.php');
...

if ($modernizr->svg) {
...
} elseif ($modernizr->canvas) {
...
}
?>

Putting all these techniques together you can dramatically improve the performance of your responsive site. There’s really no excuse for serving the same large sized assets across all browser widths. Make your responsive website respond not only to changing design patterns but to the browser environment it’s being served into. Go mobile first and performance first when designing and coding your next responsive website.

Written by rationalspace

June 20, 2014 at 1:07 pm

Setting up SFTP on amazon ec2

leave a comment »

I wanted to set up SFTP on my ec2 instance for a particular user in such a way that he has access only to his folder and can’t really see anything else. Setting up SFTP is not a challenge if you already have an ec2 instance running. SFTP runs over SSH protocol – so most likely this will just be running on your machine. Now to set up a new user , you will have to do the following:

  1. Add a user and set some desired password:  adduser joe This usually creates the home folder for that user also.
  2. We need to make that home folder of user owned by root sudo chown root:root /home/joe
  3. Give proper permissions to this folder sudo chmod 755 /home/joe
  4. Create one writable folder inside this where the user can put his files. mkdir /home/joe/files chown joe /home/joe/files
  5. Allow SFTP to allow users with passwords also to login. By default, it expects a ppk/pem and does not allow tunneled clear text passwords. So open /etc/ssh/sshd_config and change PasswordAuthentication yes
  6. Also Change the subsystem location in this file #Subsystem sftp /usr/lib/openssh/sftp-server Subsystem sftp internal-sftp
  7. Create a user section in this file to limit his access
    Match User joe
    ChrootDirectory /home/john
    ForceCommand internal-sftp
    AllowTCPForwarding no
    X11Forwarding no
  8. Save file.
  9. Restart ssh service ssh restart

Now you should be able to open a SFTP connection through any of the clients like FileZilla, CyberDuck etc. Just type in your IP/hostname, username and password and it should open up “/” directory where you will find the folder “files”. You can put your files there. Also, you will not be able to access anything apart from your folder 🙂
Things to be careful with :

  1. Make sure that you have taken a backup of the file /etc/ssh/sshd_config
  2. Make sure that you have a simultaneous shells open. So that if you do a mistake and restart your ssh, you should still be able to fix it and not get locked out.

Written by rationalspace

May 20, 2014 at 4:39 pm

Posted in OpenSource Tech, Security

Tagged with , ,

Hiding Apache and PHP version information in response headers

leave a comment »

This is an important security check that all web-masters should make – hide the web server information from the response headers.

One can easily check the response headers in firebug. It comes like this :

Server: Apache 2.4

To hide apache version  from the response headers , you can do the following in your apache config:

ServerTokens ProductOnly
ServerSignature Off

And restart apache.

You can also do a similar thing to hide php information. Find your php.ini and make sure this piece of code is there.
expose_php = Off

Again, restart apache and you are done.

Written by rationalspace

April 9, 2014 at 4:22 pm

Improving website performance – high speed delivered!

leave a comment »

Performance is a big thing. The faster the website, the better it is. In the world of new-age internet, speed is not only an important factor to retain users, but also important from SEO perspective. More and more weightage is being given to speed by search engines to rank websites. There are a number of tools like pingdom and google page speed insights available that help you to analyse your site with respect to performance

Since I have been working on improving performance of our website for quite some time, I thought of jotting down all the pointers required to optimize websites in one place.

  1. Use Sprite
  2. Zip your content – Use Content Encoding Header
  3. Add expire headers
  4. Remove blocking javascripts – Place assets optimally – CSS on top, JS on bottom
  5. Reduce Cookie Size
  6. Serve static content from another domain/CDN
  7. Optimise CSS
  8. Minify JS and CSS
  9. Cache resources – Use headers ” cache-control” and “Expiry”
  10. Render Google Ads asynchronously
  11. Compress images
  12. Don’t call ads in mobile responsively
  13. Minimise use of plugins in a CMS
  14. Optimise Queries
  15. Use APC Caching
  16. Add Character Set Header
  17. Add dimensions to images
  18. Load scripts asynchronously whenever possible
  19. Use Google PageSpeed Module

Written by rationalspace

February 26, 2014 at 3:49 pm

Maintaining order in MySQL “IN” query

leave a comment »

This is quite a common issue that you may have come across. In general, the order of any SQL query is arbitrary unless you specify an order with an ORDER BY clause. Sometimes, what we need is the ability to return results in the same order in which you passed arguments in the IN clause.

SELECT * FROM foo f where f.id IN (2, 3, 1);
I then get the following result

+—-+——–+
| id | name |
+—-+——–+
| 1 | first |
| 2 | second |
| 3 | third |
+—-+——–+
3 rows in set (0.00 sec)
As one can see, the result is ordered by id. What we want is to get the results ordered in the sequence we are providing in the query. Given this example it should return

+—-+——–+
| id | name |
+—-+——–+
| 2 | second |
| 3 | third |
| 1 | first |
+—-+——–+
3 rows in set (0.00 sec)

We can use either FIELD or FIELD_IN_SET for this:

SELECT * FROM foo f where f.id IN (2, 3, 1)
ORDER BY FIELD(f.id, 2, 3, 1);

MySQL FIELD() returns the index position of the searching string from a list of strings. If search string is not found, it returns a 0(zero). If search string is NULL, the return value is 0 because NULL fails equality comparison with any value.

Written by rationalspace

February 26, 2014 at 3:30 pm

%d bloggers like this: