Why does a raid 10 device need to be initialized?

Michael Graff asked:

When creating a linux software raid device as a raid10 device, I am confused why it must be initialized. The same question applies for raid1 or raid0, really.

Ultimately most people would put a file system of some sort on top of it, and that filesystem should not assume any state of the disk’s data. Each write will affect both disks in a raid10 or raid1 setup, where the N mirrors are written to. There should be no reason whatsoever for a raid10 to be initialized initially, as it will happen over time.

I can understand why for a raid5/6 setup where there is a parity requirement, but even then it seems like this could be done lazily.

Is it just so people feel better about it?

My answer:

Remember that RAID 1 is a mirror, and that RAID 10 is a stripe of mirrors.

The question is, on which disk in each mirror is the data valid? In a freshly created array, this cannot be known, as the disks may have different data.

Remember also that RAID operates at a very low level; it knows nothing of filesystems or whatever data might be stored on the disk. There might not even be a filesystem in use.

Thus, initialization in these arrays consists of the data from one disk in each mirror being copied as-is to the other disk.

This also means that the array is safe to use from the moment of creation, and can be initialized in the background; most RAID controllers (and Linux mdraid) have an option for this, or do it automatically.

View the full question and answer on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.