Monday, June 21, 2010

automaticly balancing data when growing a cluster

I've been thinking about load balancing lately and how to dynamically grow a cluster and its data set. I'd like to define "balance factor" as follows:

If the number of nodes in the cluster grows by a factor of x, at some future point when the data set has also grown by a factor of x the balance factor is the ratio of the smallest node to the largest node.

In my use case it is impractical for any node to know the size of all the nodes. Load balancing decisions must be made probabilistically on limited data. Today I happened across this blog post, which is very applicable to my scenario. It presents a very good solution for load balancing between a fixed number of bins, but when adding nodes without taking the system offline it is useful to look at more than 2 random points to maximize balance factor.

I did some experiments on different values for x and n (where n is the number of random points examined), and experimentally determined an approximate equation for balance factor b:

b = 1-(1/x)^(n-1)

(by approximate I mean that it is close enough for the range of values I care about, which is x between 1.1 and 10 and n between 2 and 25. It might be exactly right but I don't have time to do a proof before band practice.)

If you actually want to use this though, you need to determine n given x and the desired b. With some manipulation we get:

n = 1 + log(1-b)/log(1/x)

Let me know if you found this useful.


  1. Hello Admin!

    Thanks for the post. It was very interesting and meaningful. I really appreciate it! Keep updating stuffs like this. If you are looking for the Advertising Agency in Chennai | Printing in Chennai , Visit Inoventic Creative Agency Today..

  2. Hey!

    DigiPeek is the best SEO & Link Building Service Provider In The World. I have 7+ Years Experience To Build SEO, Backlinks & Improve Website Ranking.

    If you need Profile Backlinks, Forum Backlinks, Dofollow Backlinks, Manual Backlinks, Trusted SEO Backlinks, Increase Domain Rating Then You Will Contact Me.

    I am glad to help You!

    Let's TRY!

  3. I was unable to oppose remarking. Totally composed!
    tech news

  4. The Crack at this point in the software turns a page that shows user potential way all time. It provides very powerful results and is one of the most popular software of it’s kind in the world. Cash Register Pro Crack

  5. Adobe XD CC Flow has a same clean handler border, which supports employers to flinch wily in the strategy area. Adobe Experience Design CC Crack

  6. Quotes to assist You Cope once Losing a husband those we tend to care regarding ne’er absolutely abandon U.S.A.. Husband Death Quotes