Saturday, September 09, 2006

FlashDisk Bad Block Management

Introduction

Bad blocks are blocks that contain one or more invalid bits whose reliability is not guaranteed. Bad blocks may be present when the device is shipped, or may develop during the lifetime of the device.
Devices with bad blocks have the same quality level and the same AC and DC characteristics as devices where all the blocks are valid. A bad block does not affect the performance of valid blocks because it is isolated from the bit line and common source line by a select transistor.
As the failure of a page program operation does not affect the data in other pages in the same block, the block can be replaced by re-programming the current data and copying the rest of the replaced block to an available valid block.

FlashDisk BBM

FlashDisk implements the Bad Block Management in the Flash Translation Layer (FTL), not in the FMD layer. In FlashDisk, some good blocks are reserved for garbage collection and for replacing the runtime bad blocks.

Friday, September 08, 2006

FlashDisk Garbage Collection

Introduction

In FlashDisk when the file system write a sector through FlashDisk interface, the data is written to a new free sector position and the old sector with the same logical number is marked as invalid state. After a certain number of sector writes, the free sectors will become low, so a garbage collection is used to free the invalid flash memory space to allow further write operations.

Garbage Collection

Garbage collection is performed when the number of free sectors in the whole Flash is lower than a specified threshold value. It copies the valid sectors into new (free) sectors and erases the original invalid block.
The basic operations involved in Garbage Collection are the following:
1) The block with invalid sectors are selected for garbage collection. The selection follows an algorithm considered the performance and wear leveling.
2) The valid sectors are copied into a free sector in another block.
3) The block without valid sectors is erased and put to free block list.

Background Thread

There is a background thread in the FlashDisk which can be activated to optimize the overall performance. With this thread, the long erase times of Flash devices are completely transparent as the Garbage Collection is activated during the system's idle time. In this way, the FlashDisk erases sectors to free flash memory space automatically, and not only when the number of data sectors to be written exceeds the number of free sectors.

FlashDisk Wear Leveling Algorithm

Introduction

As we know that in SLC (Single Level Cell) NAND Flash memories each physical block can be programmed or erased reliably over 100,000 to 1,000,000 times on current technology. To increase the lifetime of the NAND Flash, a Wear Leveling Algorithm is implemented in FlashDisk to monitor and spread the number of write cycles per block.

First-Level Wear Leveling

In the first level Wear Leveling new data is programmed to the free blocks that have had the fewest write cycles. In FlashDisk, all the free physical blocks are gathered to a free blocks chain. Each time a virtual block needs to be allocated, a physical block is selected (by the Wear Leveling algorithm) from the chain.

Second-Level Wear Leveling

In the 2nd level wear leveling, long-lived data is copied to another block so that the original block can be used for more frequently-changed data.
The 2nd level wear leveling is triggered when the difference between the maximum and the minmum number of write cycles per block reaches a specific threshold. With this particular technique, the mean age of physical NAND blocks is maintained constant.

Wednesday, September 06, 2006

Installable ISR

Origin of writing this article is that when I am working to support the Microsoft Windows Automotive for ARM platform project, there is a requirement to support the installable isr for LCD interrupt. After I finished it, I summarize the process and write this article.

Introduction

An installable interrupt service routine (ISR) is one that can be installed and allowed to hook an interrupt after the kernel is built. Usually, if an interrupt had to be hooked to service some event, the code had to be built into the kernel when it was originally built. If a new device was inserted into the board, the code to handle the IRQ would need to already be in the kernel; otherwise, the IRQ could not be hooked.

Interrupt Handling
The main technique for handling an interrupt is to associate an event to a specified ISR. Windows CE schedules your IST when the event is triggered.

1) Typical normal interrupt handling sequence:
a) Hardware raises an interrupt.
b) Kernel looks up IRQ of the interrupt, calls the registered ISR, and disables all lower-priority interrupts.
c) ISR performs necessary handling, and then returns an interrupt identifier (SYSINTR_xxx).
d) If the interrupt identifier value returned by the ISR is SYSINTR_NOP, kernel completes processing without setting an event. all interrupts enabled. Otherwise, kernel sets the event. When the ISR returns, the kernel enables all other interrupts except the one that is being processed.
e) The kernel schedules the IST indicated by the SYSINTR to run.
f) IST call InterruptDone, InterruptDone calls OEMInterruptDone to perform any hardware actions necessary to enable the next interrupt of the target device.

2) Function call sequence in the interrupt handling
InterruptInitialize --> OEMInterruptEnable --> OALIntrEnableIrqs
InterruptDisable --> OEMInterruptDisable --> OALIntrDisableIrqs
InterruptDone --> OEMInterruptDone --> OALIntrDoneIrqs
Hardware IRQ --> OEMInterruptHandler --> static ISRs or installable ISRs --> IST

For an installable ISR, driver's IST is the same with the normal ISR, both of them will call InterruptInitialize, WaitForSingleObject, InterruptDone, InterruptDisable functions. Installable ISR will be called by the OEMInterruptHandler, If OEMInterruptHandler doesn't NKCallIntChain, the installable ISR will not be called. Installable ISR does things just like the static ISR does in the OEMInterruptHandler called functions, so the enable/disable interrupt still need be implemented in the OAL. We can only use the installable ISR for those IRQs that had been handled in the OALIntrEnableIrqs, OALIntrDisableIrqs & OALIntrDoneIrqs function.

Related Functions in Driver

1) LoadIntChainHandler
> This function is called by a driver to install an ISR to handle a particular interrupt.
> Parameters: DLL name, ISR function name, IRQ value.
> Return value: the handle to the installed handler, NULL if failed.
> Default ISR for all interrupt is the OEMInterruptHandler
> Sample code:
HANDLE hIISR = LoadIntChainHandler(_T("lcd_isr.dll"), _T("LcdISR"), IRQ_LCD);

2) FreeIntChainHandler
> This function unloads an existing interrupt handler.
> Parameters: handle of the installed ISR.
> Return value: TRUE/FALSE.
> Code of the ISR will not be freed from memory until reset.
> Sample code: FreeIntChainHandler(hIISR);

3) KernelIoControl
> This function provides the kernel with a generic I/O control for carrying out I/O operations.
> Will use this function to call IOCTL_HAL_REQUEST_SYSINTR to map IRQ-to-SYSINTR.
> Will call IOCTL_HAL_RELEASE_SYSINTR also.
> Parameters: IOCTL code, Out & In buffer.
> Return value: TRUE/FALSE.
> Sample code:
DWORD dwSYSINTR, dwIRQ = IRQ_LCD;
KernelIoControl(IOCTL_HAL_REQUEST_SYSINTR, &dwIRQ, sizeof(dwIRQ), &dwSYSINTR, sizeof(dwSYSINTR), NULL);

4) KernelLibIoControl
> This function is called by a driver to communicate with an interrupt handler.
> Parameters: ISR handle, OEM or ISV specified IOCTL, ...
> Return value: TRUE/FALSE (ERROR: Use GetLastError() to retrieve)
> This function will call IOControl function implemented in the interrupt handler.
> Sample code:
GIISR_INFO iisr_info;
memset(&iisr_info, 0, sizeof(GIISR_INFO));
iisr_info.SysIntr = dwSYSINTR;
if (!KernelLibIoControl(hIISR, IOCTL_GIISR_INFO, &iisr_info, sizeof(iisr_info), NULL, 0, NULL))
{
RETAILMSG(1, (TEXT("KernelLibIoControl failed!\r\n")));
//...
}

5) BusTransBusAddrToStatic
> This function translate a bus address to a system address. Then, it creates a static, process independent, virtual address mapping for that address.
> Parameters:
- hBusAccess: handle obtained from CreateBusAccessHandle
- InterfaceType: Bus type
- BusNumber: bus number where the device resides.
- BusAddress: bus-relative address of registers and ports on the device.
- Length: number of bytes to map on the device.
- AddressSpace: flag
- MappedAddress: virtual address mapped.
> Return value: TRUE/FALSE

Related Functions in OAL
1) NKCallIntChain
> This function is called by OEM from the OAL ISR routine (such as OEMInterruptHandler).
> This function determines which chained or shared interrupt device triggered an IRQ event.
> Parameter: IRQ value.
> Return value: return SYSINTR_CHAIN if no ISR has handled the IRQ event, otherwise return valid SYSINTR value to the OEM, and OEM passes back this value to kernel to trigger IST.
> Sample code: DWORD dwSysIntr = NKCallIntChain(nIrq);

2) OALIntrTranslateSysIntr
> This function returns the list of IRQs for a specified SYSINTR. It will be called by OEMInterruptEnable, OEMInterruptDisable & OEMInterruptDone function.
> Parameters: SYSINTR value, buffer for the IRQs list
> Return value: TRUE/FALSE

3) OALIntrEnableIrqs
> This function enables the list of interrupts identified by an IRQ.
> Parameters: number of IRQs in the list, Pointer to the list of IRQs.
> Return value: TRUE/FALSE

4) OALIntrDisableIrqs
> This function disables the list of interrupts identified by an IRQ.

5) OALIntrDoneIrqs
> This function scans the list of identified by an IRQ, signaling the end of interrupt processing.

6) OEMInterruptHandler
> This function is called by the kernel when an interrupt occurs. This function is only used by the ARM kernel, and provides all ISR functionality for ARM-based platforms. You do not need to call HookInterrupt in OEMInit for ARM-based platforms.
> Parameters: instruction counter when the interrupt occurred.
> Return value: a SYSINTR_* value.

Related Functions in an Installable ISR DLL
Installable ISR DLL must implement the following functions:
1) DllEntry
> DllEntry is a placeholder for the library-defined function name. You must specify the name you use when you build your DLL.

2) CreateInstance
> This function is implemented by an ISR to return a value that references an instance of the ISR. It's the DLL's first function called when the LoadIntChainHandler function is called to install the ISR handler, after the handler is loaded.
> Parameters: none

> Return value: a reference value that identifies an instance of the ISR handler, -1 identifies an error in the ISR.

3) DestroyInstance
> This function is called when an installable ISR is unloaded using the FreeIntChainHandler function.
> Parameters: instance of the ISR being unloaded.
> Return value: TRUE/FALSE.

4) ISRHandler
> This function prototype is used by an OEM/IHV to create and export an installable interrupt handler.
> Parameters: Instance of the ISR being registered.
> Return value: the SYSINTR value that corresponds to the IST that should be schedule to run. This can also include returning SYSINTR_CHAIN if the IRQ is not handled by your handler.

4) IOControl
> This function, exported by the installable ISR DLL, enables a communication path from the interrupt service thread (IST) to the ISR.

Notes about the Installable ISR DLL
1) The DLL is loaded into the kernel process space.
2) The DLL must be in the FILES section with no fixup variable or in the MODULES section with the kernel flag, K, set for a kernel-style fixup variable. A fixup variable is functionality of ROMIMAGE that allows you to initialize a global variable in the NK.exe at MAKEIMG time.
3) The DLL can contain more than one ISR handler and can be used in several call to LoadIntChainHandler. All ISR handlers have a full stack, based on the CPU.
4) The DLL cannot have dependent DLLs.
5) The DLL cannot implicitly to any other DLL, and all code must be contained in the DLL. You can set NOMIPS16CODE=1 in the sources file to prevent importing external functions.
6) Must be compiled to eliminate the C Run-Time Library from being included by default.

Samples of the Installable ISR
There is a sample of the installable ISR -GIISR in Windows CE under the public\common\oak\drivers\giisr\ folder.

Sunday, September 03, 2006

FlashDisk - Flash Translation Layer (FTL)

The FlashDisk is designed to fully replace Microsoft FAL for better stability and better performance. The original purpose of designing FlashDisk attributes to that the Microsoft FAL has some unconquerable shortcomings which are the critical features in some hardware dependent system.

Introduction

The FlashDisk is a Flash Translation Layer (FTL) which Microsoft named as Flash Abstraction Layer (FAL) used to link with Flash Media Driver (FMD) to make a Flash Block Driver. It is designed to fullly replace Microsoft FAL for better stability and better performace.

Microsoft FAL Known Disadvantages

1) Boot time is very long if the large size flash is full filled.
2) Disk may be crashed if power failure suddenly during data write process. And this will cause system can not boot up any longer if the crashed part lies in the hive-based registry area. We often have to face this issue reported by our QA engineer.
3) Read/Write performance is not well-pleasing.

FlashDisk Advantages

1) Quick Boot - Boot up very quickly even large size disk is full filled.
2) Read/Write performance is better than MS FAL driver.
3) Power failure is safe.
4) Better and more controllable wear-leveling policy.
5) Better bad block management arithmetic.
6) Automatic Garbage Collection.
7) Memory usage is acceptable.
8) Can replace MS FAL fully.

Performance Comparison

The following is a simple comparison between FlashDisk and the Microsoft FAL performance. The test platform is: 200MHz ARM920 based processor; WinCE4.2 OS; K9F1208+TC58NVG (every 2048 page is divided to four 512 pages) NAND Flash; BINFS Partition Size: 22MB, Data Partition Size: 162MB; Read/Write Performance Test App: Use 1MB buffer to R/W 5MB file 10 times.
1) Boot time
FlashDisk: 10s (empty), 10s (full)
MS FAL: 10s (empty), 35s(full)
2) Read/Write speed:
FlashDisk: R: 3563KB/s, W: 816KB/s (empty), R: 2966KB/s, W: 658KB/s (full)
MS FAL: R: 2138KB/s, W: 598KB/s (empty), R: 1854KB/s, W: 459KB/s (full)
3) Stability:
4) Memory:
5) CPU loading:
6) Bad block management:

Usage

1. Copy the FlashDisk.lib, FlashDisk.pdb to your nandflash driver path.
2. Modify the sources file in which linked with microsoft FAL.lib:
a) Remove: $(_COMMONOAKROOT)\lib\$(_CPUINDPATH)\FAL.LIB \
b) Add: \$(_CPUINDPATH)\FlashDisk.lib \
c) Modify line: DLLENTRY=DllEntry to DLLENTRY=DllMain

Remarks

To use FlashDisk, the following constraints must be followed (also these constraints are compatible with Microsoft FAL):
1) All blocks used by OEM must have the OEM_BLOCK_RESERVED flag (recommend to set OEM_BLOCK_READONLY and bad block flag, but they are optional). These blocks begin from the head of the NAND flash and include NBOOT, TOC, EBOOT, ...
2) If you use monolithic IMAGE, the blocks used by IMAGE also should have at least set the OEM_BLOCK_RESERVED flag.
3) If your system have BINFS partition, every block of the BINFS partition should have the OEM_BLOCK_READONLY flag.
4) If you want to try FlashDisk, please comment here or send mail to me.

Saturday, September 02, 2006

Scatter Loading Mechanism in ADS

Origin of this article: we have falled arcoss one requirement that some functions of the booter loader project need to build to another bin file and execute in the known fixed address. This is something like the DLL in the windows system but it is compiled by ADS. After I study the ADS linker guider I know that just using more than one load region can resolve this problem.

Introduction

Scatter loading is a mechanism provided by the ARM linker, which enables you to partition an executable image into regions that can be positioned independently in memory.

In a simple embedded computer system, memory is divided into ROM and RAM. The image produced by the linker is divided into the "Read-Only" segment, which contains the code and read-only data, and the "Read-Write" segment, which contains the initialized and non-initialized or zero-initialized (ZI) data. Usually, the "Read-Only" segment is placed in ROM and the "Read-Write" segment is copied from ROM to RAM before execution begins.



Embedded systems often use a more complex memory map, which can consist of ROM, SRAM, DRAM, FLASH and so on. The scatter loading mechanism lets you place various parts of the image in these distinct memory areas.

Scatter loading enables you to partition your program image into several regions of code and data which can be placed separately in the memory map. Each region is placed in a contiguous chunk of memory space. The location of a region can differ between load time and execute time, which the application copying code and data from its load address to its execution address.

The placement information is contained in a description file, the name of which is passed as a command line parameter to the linker.

Load Regions and Execution Regions
A program image consists of regions which may occupy different locations at load time and execution time.

This means that just before an image is executed, there are some regions which need to be moved from the locations at which they were initially loaded in memory. For example, initialized read-write data may reside in ROM, but it must be copied into RAM when the program starts executing.
There are two mechanisms available to describle where image regions should be placed in memory at execution time:
1) Using -RO and -RW command line options.
2) Scatter loading

Scatter Loading Definitions
1) Load region
The memory which is occupied by a program before it starts executing, but after it has been loaded into memory, can be split into a set of disjoint load regions, each of which is contiguous chunk of bytes.
2) Execution region
The memory used by a program while it is executing can also be split into a set of disjoint execution regions.

Placing Regions with Scatter Loading
1) Use the command line options "-scatter descritption-file", this option will cause the linker to ignore the "-RO -RW" options.
2) Scatter load images can be output in the following formats:
BIN
Generates one file for each load region, in the directory given as the output filename. These can then be blown into ROM, Flash and so on as appropriate. The output name is treated as a directory name. Each load region is placed in a separate file in that directory, with the same name as the load region. Load region names must therefore not contain characters, or be of a length, unacceptable to the host file system.

ELF
Generates a single executable ELF file suitable for loading into the debugger. A single output file, containing one section per load region, is produced. The name of the file is given by the -output option.

Linker pre-defined symbols
Using the region names given in the scatter loading description file, the linker generates the symbols required to allow each region to be copied from its load address to its execution address.
Neither the linker nor the C library provide the code required to copy an execution region from its load address or create a zero-initialized region; you must do this, as the application code writer.

The linker generates symbols which allow your routine to initialize all the execution regions that have different load and execution address. These symbols give the length, load address and execution address of each region.

For RO and RW segments:
Load$$region_name$$Base is the load address of the region
Image$$region_name$$Base is the execution address of the region
Image$$region_name$$Length is the execution region length in bytes
For zero-initiailized segments:
Image$$region_name$$ZI$$Base is the execution address of the region
Image$$region_name$$ZI$$Length is the execution region length in bytes.
These symbols can be imported and used by assembly language programs, or referred to as extern address from C (using the -fc compiler option which allow $ in identifiers).

Note
These symbols are generated for every region named in the scatter load description.
A scatter load image is not padded with zero, and requires the ZI data area to be created dynamically. This is similar to the case with a normal -bin file when the -nozeropad option is used. There is therefore no need for a load address symbol for ZI data.
The linker sorts AREAs within execution regions according to their attributes. For example, all initialized data area are grouped together. Therefore, you can assume that all initialized data that needs coping is contiguous.

Area ordering
The linker orders AREAs within each execution region by attributes. The ordering is:
Read-only code
Read-only based data
Read-only data
Read-write code
Based data
Other initialized data
Zero-initialized data
The pseudo-attributes FIRST and LAST can be used in the description file to mark the first and last AREAs in an execution region if the placement order is important (for example, if the ENTRY must be first and a checksum last).

Scatter loading and long distance branching
The ARM instruction set has branch instructions that allow a branch forwards or backwards up to 32Mb. The BL (Branch with Link) instruction also preserves the return address in register 14 (Link Register, LR).

The Thumb instruction set has much shorter branch ranges. From 256 bytes in the case of conditional branches to 2048 bytes for unconditional branches. The BL instruction has a range of 4Mb.

The linker has to ensure that no branch or subroutine call violates these range restrictions. If you place your execution regions in such a way as to require inter-region branches beyond the range, the linker generates an error stating Relocated value too big for instruction sequence.

The description file
A Scatter Load Description is a text file describing how the AREAs in a linked image are assigned to separate regions of memory.
In a scatter load description
. You list the separate regions of memory in which your image will execute, and specify an execution base address of each region.
. You describe how execution regions are packed into regions of physical memory (called load regions). The linker generates a separately loadable chunk of image for each load region. In some image formats (for example BIN), each load region is written to its own output file; In others (for example ELF, AIF), each load region has its own section within a single output file. You can think of each load region corresponding to a separate persistent memory such as ROM, EPROM, FLASH, and so on.
. Using simple patterns and attributes you describe how armlink should assign the constituent AREAs of your image to execution regions.

Notes:
. If you want to execute a code region directly then you must ensure its execution address is the same as its load address. Consequently, you must assign the AREA containing your image's ENTRY point to such a region.
. Your scatter load description does not describe the objects which make up your image. You describe that in the same way for all image on its command line. The patterns you write in a scatter load description describe how to assign the AREAs which armlink has already selected to the execution regions you defined.
. You can describe execution regions which overlap, provided that you give each region the OVERLAY attribute. If you do this, the linker generates support code and data to allow overlapping regions to be swapped dynamically at execution time and adds a reference from the support code to an overlay manager. You must have included ARM's standard overlay manager, or one compatible with it, in your list of object files.
. ARM-Thumb interworking veneers are built an AREA called IWV$$Code, You can assign this AREA to an execution region just like any other area using the AREA selector:
*(IWV$$Code)
Although there is no associated module, * still matches. Because there is only one IWV$$Code AREA, this section is unambiguous.
Example of the scatter loading file: