How CrashPlan backup works
Have you ever wondered what goes on with CrashPlan behind the scenes? In this article, we’ll take a closer look at exactly what happens when CrashPlan is backing up.
Let’s start with an example scenario where CrashPlan is using a setting to backing up the boot and an external disk drive.
CrashPlan constantly watches for new and changed files within the selected locations with what is called the real-time file watcher. It adds new, changed and deleted files to a “To do” list.
Let’s say that you created a document called “Offer letter Acme Corp 2015 re-engineering.doc”. The real-time file watcher sees that you’ve created this document and adds it to the “To do” list for backup.
This is what happens when CrashPlan starts backing up “Offer letter Acme Corp 2015 re-engineering.doc”:
- Backup begins with a process called data de-duplication. CrashPlan analyzes a small piece of the file (a block), and checks to see if that block was backed up previously.
- If CrashPlan determines that it has already backed up this block, CrashPlan moves on and analyzes the next block.
- If the block has not yet been backed up, CrashPlan:
- Compresses the block to save storage space
- Encrypts the block to secure the data
- Sends the block to the backup destination
Data is securely encrypted throughout this process.
The process repeats for the next block within the file until CrashPlan has analyzed and backed up the entire file. In this way, only unique information is backed up, which saves bandwidth and storage, and makes restoring faster.
Data de-duplication occurs on each computer. If you have the same file on two different computers, the file will be backed up twice — once for each computer.
New files and file changes
As you’re working on your letter and making changes, CrashPlan’s real-time file watcher sees that the file has changed and CrashPlan puts the file back into the “To do” list. If your letter is 4 MB, 4 MB is added to the “To do” list. Only the changes are actually sent to the destination, however, not the entire file. The changes are backed up while you work, creating a new version of “Offer letter Acme Corp 2015 re-engineering.doc”.
In this example, you’ve added a paragraph (highlighted in red):
- CrashPlan’s data de-duplication scans the file looking for new blocks of data.
- The new (red) data blocks are:
- Compressed to save space
- Encrypted for security
- Transmitted to the backup destination for storage
Backing up new file versions
By default, a new version of the file is backed up every 15 minutes. This interval is controlled by the “Backup frequency: New version” setting. We recommend keeping the 15-minute default in most cases. In case the “History” tab displays the message: “- Reason for stopping backup: A different backup set and/or destination was selected” you might need to increase the value.
Detecting changes - the details
There are actually two ways that CrashPlan learns about new files or changes to your existing files:
- Real-time file watcher
- File system scan
With two methods of identifying file changes, your files are doubly protected — CrashPlan checks for changes twice to make sure your files are backed up.
The real-time file watcher works directly with the tools built into your computer’s operating system, which means it is fairly lightweight and can easily work in the background without you noticing.
The real-time OS tools
NTFS on Windows, Spotlight on Mac OS X, and inotify on Linux.
The scan requires a bit more resources than the real-time file watcher, so to minimize possible impact on your computer, the scan runs at 1 am every day by default, when most people are less likely to be working at the computer.
Detecting changes on OS X
On OS X, CrashPlan detects that you’ve deleted a file or files during the scheduled scan. On other operating systems, deleted files are detected in real-time.
Prioritizing files for backup
Of course, you probably have more than one file on your computer that you’d like backed up. CrashPlan can’t back up all files simultaneously, so how does it decide what to work on first? CrashPlan is designed to back up the newest and most recently changed files first. This ensures that the most recent versions of your files — what you’re working on right now — are backed up as soon as possible. The priority order looks like this:
- Newer, smaller files
- Newer, larger files
- Older, smaller files
- Older, larger files
Whenever a file is added or modified, CrashPlan adds it to its backup “To do” list. Changes are added to the backup based on the frequency at which CrashPlan backs up new versions, which by default is every 15 minutes. You can change this default setting under Settings > Backup > Frequency and versions: > [ Configure ... ] > Backup frequency > New version.
Backing up very large files
If you have very large files that change frequently (such as multiple-GB virtual machine disks) and it seems like backup never completes, try creating a “backup set” for those large files with a longer “New version” interval. This gives CrashPlan more time to back up other files before it needs to back up the changes within the very large file or files.
Prioritizing backup to multiple destinations
We always recommend that you back up to multiple backup destinations for fastest backup and restore and for better protection, but exactly how does multi-destination backup work?
CrashPlan prioritizes backup activity to ensure all your selected files are completely backed up at one destination before starting backup to another, redundant destination. To accomplish this, CrashPlan backs up to destinations which it determines should complete fastest:
- Local folders
- Computers on the same network (LAN)
- Online or to computers across the internet (WAN)
For example, if you are backing up to a local folder and to a cloud destination, CrashPlan completes backup to the local folder before backing up over the internet to the cloud destination. Once backup to a destination completes (or if that destination becomes unavailable for any reason) (or if there is another backup set which back ups to a single destination), CrashPlan backs up to the next destination.
Restoring a file
You don’t have to wait for the entire backup to complete to restore a file. As soon as a file is backed up and appears on the “Restore” tab, it is available for you to restore.
Backup sets (only with software version 3+)
If you choose to enable “Backup sets”, you can specify backup priority. The goal is still to back up all your files to at least one destination first. Then, CrashPlan works on redundancy to back up your files to additional destinations. As long as one destination in the set is complete, CrashPlan moves on to back up other, less complete sets. When you have “Backup sets” enabled, CrashPlan follows these rules:
- Backup set priority
- Destination’s connection type (local folder, then LAN/local network, then WAN/across the internet)
- Percent complete to that destination
Multiple backup sets with a single destination
When multiple backup sets back up to a single destination, there are special considerations for file exclusions and frequency and version settings.
Multiple computers and backup data storage
The backup process always works the same, whether you have one computer or ten computers in your account.
CrashPlan treats each computer within your account separately and each computer’s backup is stored separately at each destination. Each source computer stores its backup in its own folder. The folder name is the source computer its CrashPlan identifier (ID).
Your CrashPlan ID default file location
win: c:\Users\All Users\CrashPlan\.identity
win: C:\Documents and Settings\All Users\Application Data\CrashPlan\.identity
mac: /Library/Application Support/CrashPlan/.identity
Is my backup starting over?
Occasionally, CrashPlan’s data de-duplication needs to re-scan your files to see what’s already been backed up. When this happens, it may look like CrashPlan is backing up all your files from the beginning, but it is actually reviewing each block to see what’s been backed up already. If CrashPlan is re-scanning your files, you may see one or more of the following:
- Progress is much, much faster than a full initial backup because information that has already been backed up is not re-sent.
- All your files are available for restore during this process.
- The amount of space used by your backed up files at the destination is consistent with the size of your file selection and backup completion percentage. To verify the amount of space used:
- Select Destinations and choose a destination type (for example, Cloud)
- Select a destination and note the Space used.
CrashPlan’s cache includes information on de-duplicated data. You’ll experience the above behavior if CrashPlan needs to rebuild its cache for any reason. This is something that happens on occasion under normal use.