This shows you the differences between two versions of the page.
— |
projects:ndfrecovery [2018/10/28 00:09] (current) joshuawise created |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Reverse Engineering a NAND Flash Device Management Algorithm ===== | ||
+ | |||
+ | {{ http:// | ||
+ | |||
+ | Around June of 2012, I had gotten myself into a very bad habit. | ||
+ | carrying my SD card in my camera, I left it sticking out of the side of my | ||
+ | laptop, presumably intending to do something with the photos on it | ||
+ | eventually. | ||
+ | as I got up out of my seat, the machine fell out of my lap, and as the | ||
+ | machine hit the ground, the SD card hit first, and was destroyed. | ||
+ | |||
+ | I was otherwise ready to write off the data stored on that device, but | ||
+ | something inside me just wasn't happy with that outcome. | ||
+ | the SD card in the trash, I took a look at what remained -- as far as I | ||
+ | could tell, although the board was badly damaged, the storage IC itself was | ||
+ | fully intact (although with a few bent pins). | ||
+ | |||
+ | The following is a description of how I went about reverse-engineering the | ||
+ | on-flash format, and of the conclusions that I came to. My efforts over the | ||
+ | course of about a month and a half of solid work -- and a "long tail" of | ||
+ | another five months or so -- resulted in a full recovery of all pictures and | ||
+ | videos that were stored on the SD card. | ||
+ | |||
+ | //If you're just looking for the resources that go with this project, [[http:// | ||
+ | |||
+ | //You can [[https:// | ||
+ | ===== Introduction ===== | ||
+ | |||
+ | It is probably fitting to start with a motivation for why this problem is | ||
+ | complex; doing data recovery from a mass-production SD card seems like it | ||
+ | should be a trivial operation (especially given the interface that SD cards | ||
+ | present), but as will become clear, it is not. From there, I will discuss | ||
+ | the different parts of the problem in detail, both in terms of how they | ||
+ | physically work, and in terms of what it means from the standpoint of a | ||
+ | data recovery engineer. | ||
+ | |||
+ | {{http:// | ||
+ | |||
+ | I begin with a brief history of the field. | ||
+ | solid-state data storage has become increasingly complex. | ||
+ | COMPAQ (and later, HP) began producing the [[http:// | ||
+ | computers, which had between 16 and 64MB of flash memory. | ||
+ | approximately a standard capacity for the time period; the underlying | ||
+ | technology of the flash device was called " | ||
+ | memory array was structured. | ||
+ | ROM or SRAM memories: it had a parallel bus of address pins, and it would | ||
+ | return data on the bus I/O pins in a fixed amount of time. The only spanner | ||
+ | in the works was that writes could only change bits that were previously ones | ||
+ | to zeroes; in order to change a zero back to a one, a whole block of bits | ||
+ | (generally, around 16 KBytes) had to be erased at once. | ||
+ | |||
+ | This was okay, though, and we learned to work around these issues. | ||
+ | flash had a limited erase life -- perhaps only some millions of erases per | ||
+ | block -- so filesystems on the device generally needed to be specially | ||
+ | designed to " | ||
+ | order to avoid burning an unusable hole in the flash array. | ||
+ | still appeared a lot like a block device that operating systems knew how to | ||
+ | deal with; indeed, since it looked so much like SRAM, it was possible to | ||
+ | boot from it on an embedded system, given appropriate connections of the bus | ||
+ | pins. | ||
+ | |||
+ | Around 2005, however, NOR flash ran into something of a problem -- it | ||
+ | stopped scaling. | ||
+ | bits per flash IC, flash manufacturers went to a technology called [[http:// | ||
+ | flash]]. | ||
+ | flash and NAND flash would be entirely internal to the array, it isn't: the | ||
+ | external interface, and characteristics of the device, changed radically. | ||
+ | |||
+ | As easy as NOR flash is to work with, NAND flash is a pain. NOR flash has a | ||
+ | parallel bus in which reads can be executed on a word-level; NAND flash is | ||
+ | operated on with a small 8-bit wide bus in which commands are serialized, | ||
+ | and then responses are streamed out after some delay. | ||
+ | amazing cycle life of millions of erases per block; modern NAND flashes may | ||
+ | permit only tens of thousands, if that. NOR flash has small erase blocks of | ||
+ | some kilobytes; NAND flash can have erase blocks of some megabytes. | ||
+ | flash allows arbitrary data to be written; NAND flash imposes constraints on | ||
+ | correlating data between adjacent pages. | ||
+ | NOR flash guarantees that the system can read back the data that was | ||
+ | written; NAND flash permits corruption of a percentage of bits. | ||
+ | |||
+ | In short, where NOR flash required simply a wear-leveling algorithm, modern | ||
+ | NAND flash requires a full device-management algorithm. | ||
+ | of any NAND device-management algorithm are three pieces: data decorrelation | ||
+ | (sometimes referred to as entropy distribution), | ||
+ | finally, wear-leveling. | ||
+ | built into controllers that emulate a simpler (block-like) interface; the | ||
+ | [[http:// | ||
+ | " | ||
+ | |||
+ | ===== Data extraction ===== | ||
+ | |||
+ | {{ http:// | ||
+ | |||
+ | Of course, none of the device-management information is relevant if the data | ||
+ | can't be recovered from the flash IC itself. | ||
+ | hardware to extract the data. I had a [[http:// | ||
+ | around 20MHz, which means that with some care, I should be able to interface | ||
+ | it directly with the NAND flash. | ||
+ | |||
+ | A bigger problem that I had facing me was that the NAND flash was physically | ||
+ | damaged. | ||
+ | were also bent. Additionally, | ||
+ | too small for me to solder directly to. I first experimented with doing a | ||
+ | " | ||
+ | but this proved ultimately too painful to carry out for the whole IC. | ||
+ | Ultimately, I settled on using a [[http:// | ||
+ | allowed each side to self-align. | ||
+ | about straightening both sides at the same time -- as long as I got them | ||
+ | each individually, | ||
+ | |||
+ | The next question was the mechanics of using the FPGA to transfer data back | ||
+ | to the host. I used Digilent' | ||
+ | created a mostly-dumb state machine interface on the FPGA, which could be instructed to | ||
+ | perform a small number of tasks. | ||
+ | FPGA, it knew how to latch data in and out of the FPGA, and it knew how to | ||
+ | wait for the flash to become ready -- beyond that, everything was handled in | ||
+ | a wrapper library that I wrote on the host side. ([[https:// | ||
+ | |||
+ | To test that my interface was working correctly and use it, I wrote three | ||
+ | tools: | ||
+ | |||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | One curious challenge was that no datasheet existed for the chip that I had. | ||
+ | I used | ||
+ | [[http:// | ||
+ | closest available]], | ||
+ | characteristics of the chip -- for instance, it did not discuss the fact | ||
+ | that there is an " | ||
+ | that the rest of the data begins at 0x100000. | ||
+ | that this flash IC is a " | ||
+ | in the array stores three bits, not just one. The address hole exists, | ||
+ | then, because three is not an even power of two...). | ||
+ | does not mention that the chip has multiple chip-selects. | ||
+ | |||
+ | The end result of this phase was a pair of 12GiB files, '' | ||
+ | ===== Unwhitening ===== | ||
+ | |||
+ | {{: | ||
+ | |||
+ | Modern NAND flash devices have a bunch of difficult constraints to work with, but one that might not be obvious is that it matters what data you write to it! It is possible to construct some "worst case" data patterns that can result in data becoming more easily corrupted. | ||
+ | |||
+ | To understand how this can be, I've included an image to the right that shows a schematic of how the building blocks of NAND flash work internally (adapted from [[http:// | ||
+ | |||
+ | The problem with this is that the bypassed transistors (that is to say, the ones that we're //skipping over// -- the ones that we're not sensing) are not perfect. | ||
+ | |||
+ | This has only become a problem recently, as we've moved to more modern technologies. | ||
+ | |||
+ | Even if you didn't understand the mechanics of it, what this means from the perspective of programming the device is that if many adjacent pages all have the same value, a page that has a differing value might be very hard to read, and might often flip to the " | ||
+ | |||
+ | The solution to this problem, then, is to add another step -- I'll take the obvious name, "data decorrelation" | ||
+ | |||
+ | ==== Mechanics ==== | ||
+ | |||
+ | I started off without knowledge of the reason for scrambling the on-flash data, and began by looking for a one-byte XOR key that would find the '' | ||
+ | |||
+ | Since I wasn't recovering anything from a one-byte key, I wanted to know whether there was anything sensible to be recovered from an otherwise-short key. I figured that if there were real data around, the distribution of bytes would not be uniform, so I wrote a program that I called '' | ||
+ | |||
+ | Around this time, I was talking with [[http:// | ||
+ | |||
+ | It produced vaguely satisfactory output -- I started to see directory structures -- but some blocks didn't have the needle in them at all, and so I was still missing more than half of the medium. | ||
+ | |||
+ | Both of these tools produced a " | ||
+ | |||
+ | Once I had extracted the key, I ran the entire image through a new, revised '' | ||
+ | |||
+ | ==== Future work ==== | ||
+ | |||
+ | The probabilistic method for extracting a key, of course, kind of sucks. | ||
+ | |||
+ | To understand why, it's important to look at the size of the key that I extracted -- 512 KB per chip select. | ||
+ | |||
+ | I suspect that the key pattern is generated, instead, by a pseudo random number generator. | ||
+ | |||
+ | ===== ECC recovery ===== | ||
+ | |||
+ | {{: | ||
+ | |||
+ | Vendors manage to drive the cost of NAND flash lower by allowing the flash memory to be " | ||
+ | |||
+ | All of these add up to the expectation of NAND flash being unreliable storage. | ||
+ | |||
+ | ==== Basics of ECC ==== | ||
+ | |||
+ | At first, the concept of improving a bit error rate without adding an extra low-error storage medium may seem somewhat far-fetched. | ||
+ | |||
+ | The underlying premise of any ECC is the concept of a // | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | These codes derive their functioning from the idea of a // | ||
+ | |||
+ | Let's take some examples. | ||
+ | |||
+ | ==== A simple ECC ==== | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | To make this somewhat more concrete, I'll provide two examples of codes that are simple enough to construct and verify by hand. The first code is extremely simple, and has been used more or less since the dawn of time: for all values of //n// bits, we can produce a //(n+1,n)// code that has a minimum Hamming distance of 2 between valid codewords. | ||
+ | |||
+ | You can see an example of this at right. | ||
+ | |||
+ | ==== Another simple ECC: row-column parity == | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | The second code that I'll present is what I call a " | ||
+ | In this version, I will give a //(25,16)// code: codewords are 25 bits long, | ||
+ | and they decode to 16 bits of data. The code has a minimum Hamming distance | ||
+ | of 4, which means that it can detect all two-bit errors, and correct all | ||
+ | one-bit errors. | ||
+ | |||
+ | As you might expect from the name, this code is constructed by setting up a | ||
+ | matrix, and operating on the rows and columns. | ||
+ | data word that we wish to encode -- for this example, 1011 0100 0110 1111 | ||
+ | -- in four groups of four, and arrange the groups in a matrix. | ||
+ | compute nine parity operations: four for the rows, four for the columns, and | ||
+ | one across all of the bits. (This is shown in the diagram at left.) | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | Although not terribly efficient, this " | ||
+ | straightforward algorithm for decoding errors. | ||
+ | code, but with a single bit flipped; the bit that was flipped is highlighted | ||
+ | in red. To decode, we recompute the parity bits on the received data | ||
+ | portion, and compare them to the received parity bits. The bits that differ | ||
+ | are called the //error syndrome// -- to be specific, the syndrome refers to | ||
+ | the codeword that is produces by XORing the received parity bits with the | ||
+ | computed parity bits. To decode, we can go case-by-case on the error | ||
+ | syndrome: | ||
+ | |||
+ | * If the syndrome is all zeroes -- that is to say, the received parity bits perfectly match the computed parity bits -- then there is nothing to do; we have received a correct codeword (or something indistinguishable from one). | ||
+ | * If the syndrome contains just one bit set -- that is to say, only one parity bit differs -- then there is a single bit error, but it is in the parity bits. The coded data is intact, and needs no correction. | ||
+ | * If the syndrome has exactly three bits set, and one is a row, one is a column, and one is the all-bits parity, then there is a single bit error in the coded word. The row and column whose parity bits mismatch point to the data bit that is in error; flip that bit, and then the error syndrome should be zero once again. | ||
+ | * If the syndrome is anything else, then there is an error of (at least) two bits, and we cannot correct it. | ||
+ | |||
+ | (It might not be immediately obvious why the all-bits parity bit is needed; | ||
+ | I urge you to consider it carefully to understand what situations would | ||
+ | cause this code to be unable to detect or correct an error without the | ||
+ | addition of that bit.) | ||
+ | |||
+ | ==== Linear codes ==== | ||
+ | |||
+ | Before we put these codes aside, I wish to make one more observation about | ||
+ | them. The codes above both share an important property: they are | ||
+ | //linear codes// | ||
+ | valid codewords can be added (XORed) together to produce another valid | ||
+ | codeword. | ||
+ | the function //ECC(d)// to mean " | ||
+ | data //d//, and return only those", | ||
+ | XOR d2) = ECC(d1) XOR ECC(d2)//. | ||
+ | |||
+ | (This will become very important shortly, so make sure you understand why | ||
+ | this is the case.) | ||
+ | |||
+ | ==== NAND flash implementation ==== | ||
+ | |||
+ | Of course, neither of the codes above are really " | ||
+ | order to be able to recover data from high-bit-error-rate devices like | ||
+ | modern NAND flash, we need codes that are substantially more capable -- and | ||
+ | with along that capability comes a corresponding increase in encoding and | ||
+ | decoding complexity. | ||
+ | of a class of codes called [[http:// | ||
+ | popular); these codes have sizes on the order of // | ||
+ | and work on those amounts of //bytes// at a time, rather than bits! Such | ||
+ | codes have the capability of correcting many many bad elements in a | ||
+ | codeword, not just one or two; the downside, however, is that their | ||
+ | foundations are rooted deep in the theory of linear algebra. | ||
+ | |||
+ | One thing to note is that even a BCH code with a codeword size of 1094 bytes | ||
+ | is not capable of filling the quantum of storage in NAND flash (a " | ||
+ | |||
+ | < | ||
+ | ^ Start ^ End ^ Contents ^ | ||
+ | | 0 | 1023 | data | | ||
+ | | 1024 | 1093 | noise | | ||
+ | | ... ||| | ||
+ | | 6564 | 7587 | noise (zeroes) | | ||
+ | | 7588 | 7657 | noise (common) | | ||
+ | | 7658 | 8681 | noise (zeroes) | | ||
+ | | 8682 | 8751 | noise (common) | | ||
+ | | 8752 | 8777 | more different garbage | | ||
+ | | 8778 | 8832 | filled with '' | ||
+ | < | ||
+ | |||
+ | It seemed pretty obvious: the code that was being used was the // | ||
+ | |||
+ | ==== Decoding ==== | ||
+ | |||
+ | Now that I knew where the codeword boundaries were, I was beginning to actually stand a chance of decoding the ECC! I started off by making some assumptions -- I decided that it probably had to be a BCH code. Since my grounding in finite field theory is not quite strong enough to write a BCH encoder and decoder of my own, I grabbed the implementation from the Linux kernel, removed some of the kernelisms, and dropped it in as '' | ||
+ | |||
+ | The next problem that I was about to face was that I wasn't sure what the whitening pattern for the ECC regions was. Although it was easy enough to guess by statistical analysis what the data regions would be, I didn't even know what the ECC regions were " | ||
+ | |||
+ | I discovered, in fact, that the ECC scheme didn't need any knowledge of whitening at all! As long as most pages were intact -- which they were -- I could come up with a scheme of correcting ECC errors without even running unwhitening. | ||
+ | |||
+ | - Start with the basic ECC identity -- \\ // | ||
+ | - Next, unwrap into raw data -- \\ // | ||
+ | - Next, hoist the whitener out of the ECC function, by the identities for linear codes given above -- \\ // | ||
+ | - Merge XOR functions that don't depend on the data -- \\ // | ||
+ | - And then finally make the XOR functions that don't depend on the data into a constant -- \\ // | ||
+ | |||
+ | This is incredibly powerful! | ||
+ | |||
+ | A curious property emerged when I did something like this. I found that the // | ||
+ | |||
+ | ==== Utilities ==== | ||
+ | |||
+ | The tools for this were relatively straightforward, | ||
+ | |||
+ | The largest complication was one that any expert in error correction codes would have foreseen by my explicit failure to mention it so far. One of the parameters that defines a BCH code is one that I haven' | ||
+ | |||
+ | The most interesting guts of the tool to decode this live in '' | ||
+ | |||
+ | ===== Wear leveling ===== | ||
+ | |||
+ | Even once we remove data interdependencies from a NAND flash device by | ||
+ | whitening, and even after we make the device reliable to store data on by | ||
+ | adding error correction, we have a final problem to deal with before | ||
+ | we can treat it as a storage device similar to a hard drive. | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | The storage abstraction that we most readily concern ourselves with is that | ||
+ | of a " | ||
+ | storage device that we would like to write to, we consider it as a | ||
+ | collection of small blocks (sometimes, " | ||
+ | addressed, and read and written independently of each other. | ||
+ | course, few devices actually behave this way: for instance, even hard drives | ||
+ | divide their sectors up among drive heads, cylinders, and then sectors along | ||
+ | the cylinders. | ||
+ | know the internals of each hard drive they access, hard drives | ||
+ | [[https:// | ||
+ | implemented a scheme called ' | ||
+ | simply maps each sector on the disk to a single number identifying it. | ||
+ | |||
+ | This scheme worked well for early flash drives, too. The first generations | ||
+ | of CompactFlash drives presented an interface in which flash blocks were | ||
+ | mapped, more or less, one-to-one with the sectors that they presented to the | ||
+ | host interface; when the host needed to do a write to a sector that already | ||
+ | had data on it, the drive would erase the entire surrounding block, and then | ||
+ | rewrite it with the new data. For small drives that saw infrequent (or | ||
+ | otherwise linear) write traffic, this was not a substantial issue. | ||
+ | as flash drives took greater popularity, demand for higher capacity grew, | ||
+ | and there came to be a new breadth of use cases. | ||
+ | |||
+ | One of the reasons why older CompactFlash drives could get away with such a | ||
+ | simple scheme was that they had relatively large write endurance (and were | ||
+ | relatively slow). | ||
+ | small amount of damage; the process of pulling electrons off of the floating | ||
+ | gate is not perfect, and electrons can sometimes be trapped in the gate | ||
+ | oxide, resulting in a reduced voltage margin between bits that read as 0 and | ||
+ | bits that read as 1. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | These early devices could endure as many as a million erase cycles, since | ||
+ | they were large enough that the available read margin was already quite | ||
+ | large. | ||
+ | shrinking; it is not uncommon to see quoted cycle lifetimes for NAND flash | ||
+ | on the order of five thousand cycles, and new NAND flash may be rated for | ||
+ | fewer than one thousand! | ||
+ | system would make for storage media with an unacceptably short lifespan. | ||
+ | |||
+ | Additionally, | ||
+ | erase sizes -- have also grown. | ||
+ | sizes of 32 Kbytes or smaller; on the NAND flash device that I used, the | ||
+ | minimum //write// size is 8 Kbytes, and the //erase// size is a megabyte! | ||
+ | early CompactFlash media, it was plausible to erase the region around a | ||
+ | sector that was being written, and rewrite it all, since the required buffer | ||
+ | was so small, and the time taken was so low; on modern flash devices, it is | ||
+ | increasingly implausible to erase an entire region when it is time to update | ||
+ | a small portion of it. The need to erase before writing, then, is another | ||
+ | property that a host system must be isolated from. | ||
+ | |||
+ | To deal with these problems and provide a more usable interface for host | ||
+ | software, a collection of algorithms called "flash translation layers" | ||
+ | been developed. | ||
+ | algorithms, since one of their purposes is to make sure that the entire | ||
+ | surface of the flash device is written evenly. | ||
+ | from the error-corrected, | ||
+ | engineer the flash translation layer. | ||
+ | |||
+ | ==== Solution space ==== | ||
+ | Just as deep as is the history of flash translation layers, so is the | ||
+ | solution space: over the years, as the requirements for a FTL have changed, | ||
+ | the implementations of varied widely in how they work. One of the problems | ||
+ | that a flash translation layer solves -- the lookup from a virtual address | ||
+ | to a physical location on disk -- closely parallels the problem that a | ||
+ | filesystem for a linear storage medium solves (that is to say, the lookup | ||
+ | from a //name// to a physical location); the reader who is well-versed in | ||
+ | filesystem technology, then, may find many similarities. | ||
+ | classes of flash translation layers that I know of into two major groups: | ||
+ | linear forward mapping systems, and log-structured systems. | ||
+ | |||
+ | The premise behind a linear forward mapping is that there exists a | ||
+ | table somewhere that provides a mapping from every linear address to its | ||
+ | physical address on the medium. | ||
+ | schemes are in how the table is stored: some, for instance, may store in a | ||
+ | write-many non-volatile RAM, while others store directly on the flash | ||
+ | itself. | ||
+ | of a linear forward mapping: the physical address can always be directly | ||
+ | computed from the virtual address by a simple mathematical function. | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | One readily-apparent scheme for a linear forward mapping is to write a table | ||
+ | to the flash medium that contains the current location of every block on the | ||
+ | system. | ||
+ | In this algorithm, lookups are extremely performant: it takes only one read | ||
+ | from the table to determine where any given virtual address is stored on the | ||
+ | device, and oftentimes, the table is small enough to be cached into a | ||
+ | controller' | ||
+ | is obvious: although hot-spots can be shifted away from the actual device | ||
+ | data, the load is shifted onto the storage used for the metadata! | ||
+ | borrow a turn of phrase, "quis wearleveliet ipsos wearleveles?" | ||
+ | |||
+ | There exist variations on the linear mapping scheme, but before entering | ||
+ | those, I would like to turn my attention briefly to its polar opposite: the | ||
+ | log-structured filesystem. | ||
+ | to one location on the medium, the opposite must be to write it //all over// | ||
+ | the medium -- and that is precisely how log-structured filesystems work! | ||
+ | One of the classic examples that is still in use on some systems today is | ||
+ | the Journaling Flash Filesystem, Version 2, better known as JFFS2. | ||
+ | |||
+ | The general principle behind a pure log-structured filesystem is that data | ||
+ | can be written to any " | ||
+ | are written, they are written only once. If a datum needs to be modified, | ||
+ | then a new version of the datum is written elsewhere on the medium, and and | ||
+ | the old version remains in place until the storage is needed. | ||
+ | process is called " | ||
+ | be quite expensive to read any given block, and that the mount procedure can | ||
+ | take a long time: in order to determine where any given datum is, it may be | ||
+ | necessary to scan the metadata of every block in the entire filesystem! | ||
+ | some regards, it can be said that a log-structured filesystem is | ||
+ | // | ||
+ | logical space it belongs. | ||
+ | |||
+ | The concept of a purely log-structured filesystem fell into disregard for | ||
+ | some time, but is now making an interesting resurgence. | ||
+ | originally developed by Sun Microsystems uses a similar strategy, even | ||
+ | though it primarily operates on media that are rewrite-tolerant; | ||
+ | aiming for wear-leveling, | ||
+ | becomes resilient to power transients during writes. | ||
+ | a metadata tree, and updating all nodes above a block each time the location | ||
+ | of a block is updated; although this creates additional disk traffic, it | ||
+ | also ensures that no unnecessary blocks need to be scanned to find the | ||
+ | latest version of a datum. | ||
+ | Layout, or WAFL, filesystem operates similarly, but I do not have direct | ||
+ | experience with it.) | ||
+ | |||
+ | In reality, few systems are purely forward-mapped or are purely | ||
+ | log-structured. | ||
+ | implementation will be a hybrid of the two architectures: | ||
+ | long-stable data may be forward-mapped, | ||
+ | changing may be log-structured. | ||
+ | journaling, where most data are forward-mapped, | ||
+ | to find space allocated for it is written to a small " | ||
+ | committed to its final location.) | ||
+ | |||
+ | ==== Initial experimentation ==== | ||
+ | My goal, then, was to investigate the locations of known data on disk, and | ||
+ | begin piecing together a mapping. | ||
+ | this article, at one point noted that the out-of-band data in each block | ||
+ | contains a reverse-map entry, but that I could expect to see duplicate block | ||
+ | entries. | ||
+ | |||
+ | I used some known data to begin investigating what these sectors and blocks | ||
+ | could possibly map to. I began searching through the image for volume | ||
+ | header data (looking for the name of the drive in order to find it), and | ||
+ | then as well for the FAT. Once I had read the volume header, I understood | ||
+ | where logical sectors of the FAT needed to be; since most data on the | ||
+ | original volume were contiguous, it was generally trivial to tell where any | ||
+ | given FAT sector should live. | ||
+ | |||
+ | I found that the out-of-band data seemed to have two bytes in it that could | ||
+ | reasonably be a counter: bytes 2 and 3, which early in the filesystem were | ||
+ | '' | ||
+ | This became important later; I began writing an initial tool to piece the | ||
+ | filesystem together based on this. | ||
+ | |||
+ | Given that the flash chip is split up into two virtual "chip select" | ||
+ | needed to know if blocks were interleaved. | ||
+ | found, at block offset 0 from a FAT block, and chip select offset 0, I | ||
+ | noticed that two adjacent pages were did not seem to go in order: there was | ||
+ | a discontinuity of 3 pages' worth between each page. Eventually, I | ||
+ | determined that this flash controller interleaved both by page within a chip | ||
+ | select, and by chip select: for any given block, pages would be written in | ||
+ | sequence to CS0+0x00, CS0+0x80, CS1+0x00, CS1+0x80, CS0+0x01, CS0+0x81, and | ||
+ | so on. | ||
+ | |||
+ | Once I had a basic understanding, | ||
+ | My first tool was called '' | ||
+ | of the chip for which reverse mappings were the most likely for any block | ||
+ | (by averaging across all of the out-of-band data in the block), and it also | ||
+ | provided a confidence estimate for each block in the forward mapping that it | ||
+ | generated as to how much data was found within it. | ||
+ | |||
+ | '' | ||
+ | another tool '' | ||
+ | exported a single large (16GB) file. I then used Linux' | ||
+ | loopback block device, so I could attempt to mount the filesystem or run | ||
+ | filesystem integrity checks. | ||
+ | |||
+ | The first attempts were inspiring, but not hugely promising. | ||
+ | when initially invoked before I understood the interleaving, | ||
+ | wide-scale garbage on the system: directories were full of noise, and the | ||
+ | FAT had lots and lots of garbage in it. Still, the general theory was | ||
+ | beginning to work! Blocks were falling into the right place, and *some* of | ||
+ | them were even correct. | ||
+ | the filesystem became accessible. | ||
+ | |||
+ | Unfortunately for me, now that the easy problems were solved, the hard | ||
+ | problems remained. | ||
+ | probably 98% of the data was as well, but '' | ||
+ | somewhat large regions of empty FAT entries. | ||
+ | expected to be of a certain size would simply have zeroes in the middle of | ||
+ | their FAT. | ||
+ | |||
+ | ==== Sector updates ==== | ||
+ | I continued down this path, and moved into an exploratory phase that my | ||
+ | notes refer to as the " | ||
+ | data, and wrote some tools to do this: '' | ||
+ | '' | ||
+ | stored in the OOB area, looking for patterns in what I suspected to be a | ||
+ | " | ||
+ | '' | ||
+ | the computed forward-table, | ||
+ | on-disk forward lookup table. | ||
+ | out. | ||
+ | |||
+ | Failing these, I fell back on a more brutish approach. | ||
+ | sectors //had// to be on the medium somewhere, but I didn't know where. | ||
+ | Having no luck finding a forward mapping, I began searching for reverse | ||
+ | mappings for bad FAT entries by hand, and began developing a tool I called | ||
+ | '' | ||
+ | in it (that I had found by hand), and took a " | ||
+ | to which logical page was the most likely for it to correspond to. (Since | ||
+ | the FAT is mostly linear on a photo and video storage device, this sort of | ||
+ | statistical analysis became trivial.) Once it had high enough confidence for | ||
+ | any given page, it emitted a "patch list" of where reads should be | ||
+ | redirected to. (At some point, I referred to these small update mode as | ||
+ | " | ||
+ | updates really work on a page granularity, | ||
+ | sector updates.) | ||
+ | |||
+ | I began updating '' | ||
+ | set it up to take a configuration file of its own. As I continued to work, | ||
+ | '' | ||
+ | invoking '' | ||
+ | device every time I needed to make a small change to the source distinctly | ||
+ | tempered progress (and, arguably, gave me a bad temper, too!). | ||
+ | Additionally, | ||
+ | spending more time doing development on my laptop, or otherwise somewhere I | ||
+ | couldn' | ||
+ | the FUSE front-end and the '' | ||
+ | tool I called '' | ||
+ | |||
+ | '' | ||
+ | it reused a FAT filesystem driver that I wrote for [[https:// | ||
+ | extended it with FAT32 support and with long file-name support. | ||
+ | Additionally, | ||
+ | if it found implausible behavior in the primary FAT, instead of returning a | ||
+ | damaged file, it would attempt to read from the secondary FAT. This saved | ||
+ | me a fair bit of work. | ||
+ | |||
+ | ==== Results ==== | ||
+ | The '' | ||
+ | of the data on my storage device! | ||
+ | by the need to use what felt like an excessively brute-force approach to | ||
+ | find data on the medium, but the triumph of having recovered my data seemed | ||
+ | to outweigh the questionable means by which I achieved it. | ||
+ | |||
+ | This leaves plenty of opportunity available for future work. For one, a | ||
+ | blind reverse engineering approach may be valuable: there is still plenty of | ||
+ | data on the volume, and it would be interesting to continue to undertake | ||
+ | statistical analysis to determine what controls where it belongs. | ||
+ | another, it may be possible to find a controller firmware image on the | ||
+ | device somewhere; this would be the "holy grail", | ||
+ | understanding of how the device operates. | ||
+ | indicates that it may contain an 8051 microcontroller. | ||
+ | |||
+ | ===== Conclusions ===== | ||
+ | {{ : | ||
+ | |||
+ | In this article, I have described the full recovery of data from an SD card | ||
+ | that sustained extensive physical damage. | ||
+ | physically connecting to the flash medium; of electrically interfacing with | ||
+ | it; of removing the " | ||
+ | using the ECC information on disk; and of translating linear addresses into | ||
+ | physical addresses as controlled by a flash translation layer. | ||
+ | |||
+ | I hope that you have found this work inspiring in some way. I wrote this | ||
+ | article because I was dismayed to find that much of the information that | ||
+ | exists about recovering data from solid-state media is " | ||
+ | depths of some closed-source extraction tools; I hope that in publishing | ||
+ | this, I have helped to democratize this understanding in some way. | ||
+ | |||
+ | In this article, I hoped also to convey one of the lessons that I learned | ||
+ | from this effort: in short, " | ||
+ | indicates that a device is unrecoverable, | ||
+ | sufficient effort, it is not just possible but reasonable to effect a full | ||
+ | recovery of all of one's data. | ||
+ | |||
+ | This article is the culmination of six months of software programming, | ||
+ | nearly two years of on-and-off writing. | ||
+ | found this useful, or have information to contribute, please do not hesitate | ||
+ | to [[: | ||
+ | |||
+ | //You can [[https:// | ||