ECC (Error Correction Codes) is a method used to detect and correct errors introduced during data storage or transmission. Certain kinds of RAM chips inside a computer implement this technique to correct data errors and are known as ECC Memory.
ECC Memory chips are predominantly used in servers rather than in client computers. Memory errors are proportional to the amount of RAM in a computer as well as the duration of operation. Since servers typically contain several Gigabytes of RAM and are in operation 24 hours a day, the likelihood of errors cropping up in their memory chips is comparatively high, hence they require ECC Memory.
Memory errors are of two types, hard and soft. Fabrication defects in the memory chip cause hard errors, which cannot be corrected once they start appearing. On the other hand, electrical disturbances predominantly cause soft errors.
Memory errors that are not corrected immediately can eventually crash a computer. This has more relevance to a server than a client computer in an office or home environment. When a client crashes, it normally does not affect other computers even when it is connected to a network, but when a server crashes it brings the entire network down with it. Hence ECC memory is mandatory for servers but optional for clients unless they are used for mission critical applications.
ECC Memory chips mostly use Hamming Code or Triple Modular Redundancy as the method of error detection and correction. These are known as FEC (Forward Error Correction) codes that manage error correction on their own instead of going back and requesting the data source to resend the original data. These codes can correct single bit errors occurring in data. Multi-bit errors are very rare and do not pose much of a threat to memory systems.