Exercise 1 - Zip File Extractor
Design, Code and Unit-Test Using Structural and Traversal Design Patterns
Deadline April 6th, 2006 at Ross closing time
Description In this exercise you will:
  • Design the first version of a ZIP file manager and extractor
  • Produce a set of UML class and sequence diagrams to document your design
  • Experience design patterns in actual Java code
  • Write a covering set of unit tests
  • Experience working with Eclipse, JUnit and Ant

Requirements

Create an executable named xzip which is able to print and extract the contents of ZIP files. The program's usage from the command line should be:

xzip [-l | -s | -x] [-nr] <zip file name>

Note that the first two parameters are optional, but the file name is not. Parameters must appear in this order.

A zip file can contain files, which may be compressed (but don't have to be). In addition, a zip file can also contain directories, which may by themselves contain files and other directories. When a zip file is extracted, the directories it contains are extracted as well, and files within these directories are extracted to their correct relative location.

For the examples given below, assume that the zip file my.zip contains the following: A file called mysong.mp3 which is an MP3 song, a file called myicon.gif which is a GIF image, and a directory called 'web' which includes two HTML files called index.html and other.html and one directory called backup file which includes two other files with these same names.

The first argument of xzip dictates what action will be performed on the zip file:

  • -l (long print): This action prints the names and details of the files stored in the given zip file. The long format means that in addition to the file names, more details are printed for each file. For example:

    > java xzip -l my.zip
    mysong.mp3      (7901 bytes, MP3 file)
    myicon.gif      (910 bytes, 32x32 GIF image)
    web             (directory, total 5 files)
       index.html       (5657 bytes, 122 lines of text)
       other.html       (6983 bytes, 208 lines of text)
       backup           (directory, total 2 files)
          index.html       (5622 bytes, 120 lines of text)
          other.html       (7501 bytes, 234 lines of text)

    As you can see, the details printed for each file type are different, and follow these rules:
    1. For each text file, the name of the file is printed, followed by a tab, and then the text (N bytes, M lines of text) where N is the uncompressed size of the file, and M is the number of lines (measured by the number of new-line characters) in the file. A text file is defined as any file whose extension is *.txt, *.html or *.java.
    2. For each image file, the name of the file is printed, followed by a tab, and then the text (N bytes, WxH EXT image) where N is the uncompressed size of the file, W is the image's width, H is the image's height, and EXT is the image file extension in uppercase. An image file is defined as any file whose extension is *.gif or *.jpg.
    3. For any other file type - any regular file not a text or zip file, such as mysong.mp3 and backup.zip in the above example - print the file name, a tab, and the text (N bytes, EXT file) where N is the size of the uncompressed file and EXT is the file extension in uppercase.
    4.
    For each directory, the name of the directory is printed, followed by a tab, and then the text (directory, total N files) where N is the number of files in that directory. Each directory that the directory contains is counted as one (for the directory itself) plus the number of files inside that directory, recursively. In addition, as the example above shows, the contents of the directory are printed in the following lines, under the same rules, with an indent of three spaces relative to the indent of the directory.
     
  • -s (short print): This action prints the names of the files and directories in the given zip files, but without the extra details (file size, number of text lines and so forth) which the long format provides. Printing is equivalent to what the -l option outputs, including indentation and recursion into directories and zip files - only the text in parentheses for each file/directory is not printed. For example:

    > java xzip -s my.zip
    mysong.mp3
    myicon.gif
    web
       index.html
       other.html
       backup
          index.html
          other.html
     
  • -x (extract): This action actually extracts the contents of the zip file into the file system. That is, for each file/directory in the zip file, a new uncompressed file/directory should be created in the file system. Files should be created in the current directory, but files that are inside directories in the zip file should be created inside these relative directories in the file system. If a file is being extracted and another file with the same name already exists, then the existing file should be overwritten.
    Upon completion, this action prints one line to the console in this format: Extracted N files, M files failed. N is the total number of  files and directories that were successfully created, and M is the total number of files and directories that failed. In addition, one line is printed for each failed file or directory, including a detailed error message. Whenever possible, an error should not result in halting the entire program, and the program should output the error message and continue normally. For example:

    > java xzip -x my.zip
    Error: Failed to extract myicon.gif: Cannot overwrite existing file - file is in use
    Extracted 7 files, 1 files failed


    And the reported 7 successful files will be the following (locations are relative to the current directory):
    mysong.mp3
    web\
    web\index.html
    web\other.html
    web\backup\
    web\backup\index.html
    web\backup\other.html

The second command-line argument of xzip means "no recursion", and if it appears then all actions should be performed without recursion into directories. This means that only one summary line is printed for every directory in the printing actions, and that the directory is created but not populated in the extract action. For example:

> java xzip -l -nr my.zip
mysong.mp3      (7901 bytes, MP3 file)
myicon.gif      (910 bytes, 32x32 GIF image)
web             (directory, total 5 files)

> java xzip -x -rn my.zip
Extracted 3 files, 0 files failed

In this example, the 3 extracted files will be mysong.mp3, myicon.gif and web\.

The default action is -l, meaning that xzip my.zip is equivalent to xzip -l my.zip. You should print an informative usage message if the program is activated with no or illegal command-line arguments. You should print a detailed error messages and exit gracefully when a critical error occurs (the given zip file does not exist, the given zip file is corrupted and so on).

Design

While this exercise can be programmed within a single class, this won't work since this xzip is only a first version, so it is crucial to maintain an open mind with respect to possible future requirements. Consider the following:
  1. It may be required to be able to read input format other than ZIP, such as TAR, ARJ, CAB and other archive file formats. The input does not even have to be a file: It can be the set of files of a given directory, a given FTP server address and so forth.
  2. It may be required to support other file types that the long print action provides additional details about. For example, printing the image size for image files such as myicon.gif, or printing the song duration for music files such as mysong.mp3.
  3. It may be required to produce the output in formats other than plain text - HTML, PDF, Word or others. It may also be required to write output in several formats at once, for example:
    xzip -pdf myzipfiles.pdf -html myzipcontents.html my.zip.

  4. It may be required to modify the input zip file instead of just printing or extracting it. For example, new actions may enable adding files and directories to the zip file, changing the date and time signature of zipped files, and so on.
  5. It may be required to support recursion into zip files, and not only into directories. For example, if a zip file contains another zip file, then its contents would have to be printed (recursively) like a directory's contents are printed, and when a file is extracted any zip files it contains would be extracted as well.
  6. It may be required to activate all of the program's features not only from the command-line, but also from a graphical user interface, or perhaps even two user interfaces (for example, one that is a custom UI for handling zip files, and another which is fully integrated with the Windows Explorer).
You must design your program so that it is easy to add code that implements the above requirements. Assume that you are the one who will actually have to code it - that's how it usually is in "real life". For each of the above requirements write an explanation in your README file, not more than three sentences long, which explains how it should be coded. For example:

Requirement: It may be required to define filters on which files get printed or extracted, in addition to the -nr switch. For example, new command-line arguments can dictate that only text files should be acted on, only files that match a given pattern (such as *.cpp), and so on.

Solution: Write an Iterator for each kind of filter, whose next() method will move to the next element for which the filter is true. Such iterators are implemented as Decorators of other iterators, which easily enables to dynamically combine different filters and does not require changing or recompiling existing code.

It is important that each solution you present will be at most three sentences long. The intention is to enforce the use of design patterns vocabulary rather than elaborating specific class and object relationships.

Code & Unit Test

This exercise intends you to divide your time equally between actual coding and between design, writing UML diagrams, and answering the above six design questions. With a proper design, this exercise is quick and simple to code. You are required to write in Java 5.0, and you may use the standard libraries to their full extent - the standard streams, strings and data structures. In particular, working with zip files is done using the java.util.zip package, and working with image files is done using the javax.swing.ImageIcon class (see references and code samples below).

You are also required to use the Eclipse environment for this exercise, and are encouraged to take advantage of its editing and debugging features to their full extent. Submit the Eclipse project file together with your exercise, so that it would be possible to open your project and read your code, UML diagrams and Ant file within Eclipse. The UML diagrams should be done using the Omondo UML plugin (see the Technical Help page for details on installing it at home). You are required to provide class diagrams that include all your classes; there may be more than one diagram, if this is visually easier. You are also required to provide at least two sequence diagrams, depicting two interactions in your design which you consider important.

It is also required that you submit unit tests to test your work, and use the JUnit framework to do so. Organize your unit tests into classes by subject, and write a method for each small test. Organize the code such that the program source code is in one package (for example xzip), and the tests are in a separate package (for example xziptest). Each test should be self-validating - that is, know by itself whether it has passed or failed. Writing unit tests should be an integral part of coding: This is essential when code must be changed in newer versions. We recommend that you try test-first programming - read the following article as a starting point - and in any case you will lose time if you only write all unit tests after you "finish" coding. You will have another chance to estimate the convenience of unit tests in exercise 3.

One metric for measuring the usefulness of a set of unit tests is called coverage, which means the percentage of your code that the unit tests actually run. Coverage of 90% or above is considered good, and you should aim to that goal.

The code you submit must be built with no compiler warnings, and pass all unit tests. You must submit an Ant file (build.xml) with your exercise, whose default target compiles the entire program, and runs all unit tests.

This is an advanced course, so there is no intention to take points for coding style or naming conventions - the emphasis is on proper design. However, you are as always expected to write clear code with a consistent style.

Submission Submit a zip file which contains the following:
  • All program source code.
  • All unit tests source code.
  • The Ant build.xml file.

  • The UML diagrams (class diagrams + at least two sequence diagrams).

  • A README file, with the usual contents (IDs, logins and full names, descriptive list of files and features) and answers to the six possible extensions in the design section above. The README file should also describe parts of the design or design choices that are not evident from reading the UML diagrams.

  • The Eclipse project file, to enable opening your project and reading all of the above files from within Eclipse.

We use the course-admin system for exercise submission and grading; do not submit any printouts.

Resources