Their original pods stored 45 drives and this improved to 60 drives in ~Q2’16 (according to past blog posts by Backblaze). This lets me calculate the number of pods that Backblaze has in its datacenters. Looking at Power-on-hours of each drive, I was able to calculate the vintage of each drive and the number of drives in each “pod” (this is the terminology that Backblaze gives to its storage enclosures). Pod (Storage Enclosure) Performance Click here to zoom Going forward, it does looks like the new 12TB WD HGST drive is starting to track bytes read/written. Fortunately, roughly 80% of Backblaze’s drive population (both capacity and units) are Seagate so it’s a large enough population to represent the overall drive population. A quick note: only Seagate hard drives track the needed information in their SMART data in order to get insights about performance. Now let’s look at some performance insights. For some reason their read workloads spiked in Q2’17 and have maintained a higher read workload since then (as indicated by the YoY spikes from Q2’17 to Q1’18, and then settling back to less than 50% YoY since) my guess is this was likely driven by a change to their internal workload rather than a migration because I didn’t see subsequent negative YoY reads. Looking at the last two years, by quarter, you can see a healthy amount of year-over-year growth in their write workload roughly 80% over the last four quarters! This is good since writes likely correlate with new user data, which means broader adoption of their offering. If the “% User Petabytes” is below that max then this means Backblaze either has unused capacity or they didn’t update their website with the actual data stored.ĭata read/written versus capacity growth Click here to zoom The Theoretical Max (green line) is based on their ECC protection scheme (13+2 and/or 17+3) that they use to protect user data. I grabbed the publicly posted “Petabytes stored” that BackBlaze claims to have stored (“User Petabytes”) and compared that against the total capacity from the SMART data they log (“Physical Petabytes”) and then compared them against each other to see how much overhead or unused capacity they have. ![]() User Data vs Physical Capacity Click here to zoom For those interested, I used MySQL to import and transform the data into something easy to work with ( click here to see more details on my SQL query) I then imported the data into Excel where I could easily pivot the data and look for insights. This would give me enough granularity to see what is happening inside Backblaze’s cloud backup storage business. Rather than looking at nearly 100 million records, I decided to only look at just over one million which consisted of the last day of every quarter from Q1’16 to Q1’19. I decided to see what this data could tell me and what I found was fascinating. And they share this raw data on their website but most probably don’t really dig into it much. With 100K+ records a day, each year can produce over 30 million records. My analysis below shares some insights about your business that vendors might gain from seemingly innocent data that you are sending every day.Įvery day, Backblaze (a Cloud Backup Storage provider) logs all it’s drive health data (aka SMART data) for over 100,000 of it’s hard drives. It is now common practice for end-customers to share telemetry (“call home”) data with their vendors. Update: This analysis was featured on Backblaze’s Blog and I got to meet the executive staff at Backblaze to discuss my findings The team and culture there is amazing!
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |