File Compression System


  • Algorithms


Mentors :

  • Amritaansh Narain

Mentees :

  • 4


Ever thought about how does gZip and all these online file compression services reduce your file size from 50 MB to 10 MB or even lower while keeping everything intact. Intuitively you might ask that how even is this possible, wouldn't this lead to data loss. Well, what these softwares use are known as lossless compression algorithms i.e. they compress data in a manner such that all original data can be retrieved in it's complete format. This project would serve as an intro to the same, by the end of the project, you would have implemented algorithms which would allow you to compress & decompress files on the level of gZip. gZip uses DEFLATE scheme which is a combination of LZ77 & Huffman encoding, both of which we will be implementing in this project. We would be doing our coding in C++ & you would be using pointers, adresses, data structures extensively and would be a level up your CS101 course, so I expect coding skills you get from your CS101 course. You would get to develop an actual application which you can use for your own purpose. This would serve as a great avenue for improving your coding skills and learning something on the way. We will work only on lossless compression algorithms. However, as you might see, we can design algorithms which let's say preserve 95% of the data while giving massive compression ratios. This 5% data loss wouldn't be visible at in visual, audio data however the file size decrease would make it immensely easier to transmit across networks. These algorithms would fall into research domain and this project would serve as a basic groundwork for compression systems. Intro to compression - https://www.youtube.com/watch?v=Lto-ajuqW3w Why LZ77 is cool - https://youtu.be/goOa3DGezUA?list=PLch3To-m3L-PQoaFTYMdVZkHFD9W6URi4

Prerequisites: Assume you have basic understanding of C++ and are pretty comfortable at it. Make sure you are taking up the project under a complete consciousness cause the project might get heavy to handle. Consider me as a mentor to guide you along the way but not spoonfeed you for the same so you will be on your own for the most part. Feel free to contact me to clear up anything you want to know about the project before applying for it.


Tentative Timeline :

Week Work
Week 1&2 Basic overview of c++ with maps, pointers, strings, study about various compression algorithms and get a thorough understanding of the same
Week 3 Implement general RLE, Huffman Encoding for compression & decompression
Week 4 Implement general LZ77 Algorithm for compression & decompression
Week 5 Implement general LZ78 Algorithm for compression & decompressio
Week 6 Implement general LZW Algorithm for compression & decompression
Week 7 Catch Up on incomplete work
Week 8 Report of results with different compression algorithms when used alone or when compounded among each other.