Just How Much Data Do We Crunch?

When you’re a hard-working monkey on the Business Intelligence team here at SurveyMonkey, it means we concentrate on these three things: data, data, and more data! Thanks to modern technology, the cost of storing data has decreased while the speed of accessing data has increased. Now that the ability to cost-effectively distribute computing problems across multiple machines exists, it’s almost irresponsible for companies and organizations to not store as much data as possible. In this day and age, too much information isn’t always a bad thing.

But exactly how much data is considered to be a lot of data? And what kind of data does our team here at SurveyMonkey analyze and in what quantities? Let’s start with what we call log files.

Log files are sort of like toned-down Word documents – they’re basically raw text files. For example, when our application is running, it does things like make a record of when errors occur. We can then use those records to fix issues on the site and kill any bugs that pop up on the site.

So just how much text do you think we log? Drumroll, please…!

Over 70 gigabytes each and every day.

70 gigabytes–What would that look like in the real world?

Let’s take the raw text version of Mark Twain’s Adventures of Huckleberry Finn. The book is 596 kilobytes, and 1kB equals one-millionth of a GB. That means every day, SurveyMonkey writes the equivalent of nearly 1,800 copies ofHuck Finn in order to log files. That’s a whole bunch of text.

If you include the data that we store for the application to actually function, then you’re really looking at a truly gigantic amount of data. Doing a quick calculation tells us that over the past year we’ve created about 25 terabytes of data (1 terabyte = 1,000 gigabytes). Assuming that the average page has about 500 words and words are 5 characters plus the space between them, then that means every person on Earth could write a full page paper, send it to us, and our working dataset would still outmatch it by 25%.

That’s a whole lotta data and we monkeys are big fans of data!

Got data on the brain and want to pick ours? Let us know in the Comments section below!

