Michael Graff asked:

When creating a linux software raid device as a raid10 device, I am confused why it must be initialized. The same question applies for raid1 or raid0, really.

Ultimately most people would put a file system of some sort on top of it, and that filesystem should not assume any state of the disk’s data. Each write will affect both disks in a raid10 or raid1 setup, where the N mirrors are written to. There should be no reason whatsoever for a raid10 to be initialized initially, as it will happen over time.

I can understand why for a raid5/6 setup where there is a parity requirement, but even then it seems like this could be done lazily.

Is it just so people feel better about it?

My answer:

Remember that RAID 1 is a mirror, and that RAID 10 is a stripe of mirrors.

The question is, on which disk in each mirror is the data valid? In a freshly created array, this cannot be known, as the disks may have different data.

Remember also that RAID operates at a very low level; it knows nothing of filesystems or whatever data might be stored on the disk. There might not even be a filesystem in use.

Thus, initialization in these arrays consists of the data from one disk in each mirror being copied as-is to the other disk.

This also means that the array is safe to use from the moment of creation, and can be initialized in the background; most RAID controllers (and Linux mdraid) have an option for this, or do it automatically.

