Latest News

Tuesday, September 22, 2020

Data deduplication in Big Data by compression

Compression helps to minimise the size of a file by eliminating unnecessary data from the file. By making files smaller, less storage space is used, and more files can be saved on the storage. For example, a text file of 100 KB can be compressed to 52 KB by eliminating extra spaces or replacing long character strings with short representations. 





When the file is read, an algorithm recreates the original data. Image files are typically compressed as well. For example, the JPEG image file format uses compression to remove redundant pixel data.





Advantage:

Almost any file can be compressed, although files with non-duty data can compress little, if any, so that compression ratios are a guideline and not a law. For eg, a compression ratio of 2 to 1 would hopefully allow a file value of 400 GB on a 200 GB disc. 

Drawbacks:

It is difficult to know just how much a file should be compressed before a compression algorithm is used.

  • Google+
  • Pinterest
« PREV
NEXT »

1 comment

  1. Data deduplication in Big Data involves identifying and eliminating redundant data to optimize storage and improve efficiency. When combined with compression techniques, the benefits can be enhanced further. Here’s how data deduplication by compression works in the context of Big Data:

    Data Deduplication
    Definition: Data deduplication is the process of identifying and eliminating duplicate or redundant data segments within a dataset.
    Benefits:
    Storage Optimization: Reduces storage requirements by storing unique data once and referencing it across multiple instances.
    Bandwidth Savings: Minimizes data transfer and replication across networks.
    Improved Performance: Enhances data retrieval speeds and reduces latency.

    Big Data Projects For Final Year Students

    Compression Techniques
    Definition: Compression reduces the size of data through algorithms that encode information more efficiently than its original form.
    Benefits:
    Storage Efficiency: Shrinks data size, reducing storage costs and optimizing disk space.
    Faster Data Transfer: Reduces bandwidth usage and speeds up data transmission.
    Performance Optimization: Decreases I/O operations and enhances overall system performance.

    Machine Learning Projects for Final Year

    Deep Learning Projects for Final Year

    ReplyDelete