Starting this page, I am going to explain the real deal in the functioning of the flash storage, which in turn, would change your outlook on how you would evaluate a flash product.
Program Erase (P/E) Cycles
As explained in my previous blog, when you write the data onto the flash it is called a program state that is when you hold the electrons in the floating gate. Writing operations happen at the page level (typically 8-16KB in size). Read operations also happen at the page level.
An Interesting part of the operations in a flash chip is when you have to update the data already written, unlike a disk storage you cannot just perform update operations or undo or change a particular data. In a flash chip if you want to update or change the data already written you will have to erase the old data first and rewrite the whole data again and erase operations happen at a block level.
Every time you have to erase a page or update data in a page you will have to erase the whole block even if you don’t want to update the other pages in the block. Erase operation takes longer than read operations as you have to change the whole block. This is why the life of flash chip is measured in Program Erase cycles, (also referred to as PE cycles) because both program and erase happen simultaneously and this, in turn, leads to the damage of oxide layer (please refer to my blog understanding flash at its core), and each flash chip has only a limited number of program & erase cycles it can take.
An alternative to this “erase operation” would be to mark the page as invalid and write the new data in a new page. In this way you can avoid the obscure erase cycle and increase the life of a chip but when you write data to another location you will have to redirect the reads of that page marked as invalid to the location. This is where flash translation layer would kick in.
So to make flash a friendly medium for storing our data, we have an abstraction layer (Flash translation layer) which will:
- Write updated information to a new empty page and then divert all subsequent read requests to its new address
- Ensure that newly-programmed pages are evenly distributed across all of the available the flash so that it wears evenly
- Keep a list of all the old invalid pages so that at some point, later on, they can all be recycled for reuse
Wear leveling sounds pretty simple and easy when you hear it first. You have a flash with defined set of blocks and PE cycles, (program/ write happens at page level and erase happens at block level,) as constant program and erase cycles wear out the flash blocks, instead of erasing a block every time you have to update a page within a block you mark that page as invalid and write into a new page. These invalid pages at some time have to be erased to reutilize the space. This helps the flash blocks to wear out evenly than a few blocks wearing out early and thus reducing the capacity promised to the customer.
There also another part of wear leveling which we don’t look at i.e. within the flash storage there would be some blocks which would be only frequently read but not updated or where data doesn’t change. These would be cold blocks while other are being updated which would be hot blocks. These cold blocks would never wear out. This would again lead to uneven wearing of flash blocks. So to avoid such situation of the system we take steps to manually relocate that cold data otherwise those blocks won’t ever wear… and that means we are actually adding write workload to the system, which ultimately means increasing the wear.
In other words, the more aggressive we are at wear leveling the early we would wear out the system but if we don’t do wear leveling thinking of the cons … we would end up with hot and cold spots and it would lead to uneven wearing of the system. Hence, it is a question of right balance.
We have so far talked about marking the pages invalid and writing the new data on a fresh page. These invalid pages have to be recycled i.e. they have to be erased. Of course erase would be a big operation to do as you have many pages in the same block which are being used and in flash you have to erase a complete block and cannot just recycle a page.
Let me explain the tricky part
In the above picture, you will see that 30% of the blocks are written and rest are empty.
Now if the data has to be updated then instead of erasing the whole block and re-writing it they mark those page as invalid or stale and write the data in another page in the same block or another block.
In the above diagram, there is 50% free space in each block which garbage collection algorithm can use to copy the data from a second block and erase the second block completely and reclaim the space.
What if the block is 50%-70% full? Like in the diagram below
How will the garbage collection algorithm erase the invalid pages without being able to copy the complete block data into other blocks?
This situation is a disaster because at this point it can never free up the stale blocks, which means I’ve effectively just turned my flash system into a read-only device and if you look at the capacity graph I have used only 70% of the capacity. Does this mean I can never use my flash system to 100%?
This is the reason why all flash vendors over-provision (as below) the storage for free to help you utilize 100% of the Flash storage.
Yes, there’s more flash in your device than you can actually see. This extra area is not pinned, by the way, it’s not a dedicated set of blocks or pages. It’s just mandatory headroom that stops situations like the one we described above from ever occurring.
Having explained three important concepts of flash I will take a break now.
I will explain few more interesting features of NAND flash functioning in coming few blogs hence………to be continued.