About

The duplicate finder is a tool that helps you identify likely duplicate files. Duplicate files are determined by file name, file type, and also two small checksums – one small checksum on the header and one small checksum on the body of the file.

Limitations and considerations

Checksums

Because the application needs to check every single potential duplicate, this list can be large checksums needs to be performed to verify its a likely duplicate. Since checksums could be very CPU intensive, this feature is limited to one request at a time. Results are cached, so the same file won’t be re-processed again if searching over a short time.

Minimum Size

If there are many files to check, the process can take quite some time. Try starting a search with a larger value to find the biggest files first.

Timeouts

If you are running this behind a proxy or some system that will cause a time limit for api operations, you may see a timeout. Since results are cached, you should be able to search shortly after again and get results back more quickly.

Disabling the feature

With all of the above considerations, the duplicate finder does work very well. In future versions, if you do not wish users to be able to use this feature it will be disabled at a server level. But currently, theres no way to block the usage.

Example Usage

Here is an example duplicate file search: