Documentation of the Website-project
Licence: CC BY-NC-SA (<a href="https://creativecommons.org/licenses/by-nc-sa/3.0/de/">info</a>)


=== Aim ===

Translate a directory structure into a website link-structure.

=== Input conventions ===

A webpage is constructed out of a directory. You can think of the code as performing a bijection between an object
(directory, contents-file, additional files) and a (webpage). The content of the webpage is related to the content of the directory.
When calling 
$ java Website mydirectory/
specify a directory which will be used as the main page of your website (The slash at the end is optional and is checked). It will be given a default name (apperaring as the filename of the future html version), specified in 
public static String MAINPAGE_FILENAME of public class Website (no extension, so "index", but not "index.html"). All the other pages, but for a slight subtlety with <i>unlinked</i> pages (see later), will get as filename the name of the corresponding directory. In this directory you are free to place files you need to link on and a file containing the actual content (= text, ...) of the particular webpage. In general all files contained in the directories will be available in the website, if they are not marked as being only for <i>internal use</i>. The looks of this tag is defined in
public static String INTERNAL_USE_TAG of public class Website and it is meant to be placed as first character of the file's name. If a file or a directory is marked with this tag, it will never be read or used or copied.
The content of the webpage will always be read from a file contained in the directory corresponding to the webpage. Its name is same for all and is defined in
public static String CONTENT_FILE_NAME of public class Website (including extension, if you want, e.g. "content.txt").
Attention: To the String defined here (say e.g. "content.txt") the INTERNAL_USE_TAG (say e.g. "_") will be added, such that the actual file you should place into the directory must be called "_content.txt".
The hierarchy structure of the directories will be reflected in the link structure of the webpages. The code provides means to automatically place links to the hierarchically next page up, to the pages down and to the sister-pages (the corresponding variables are in Page.java). This is done in 
private String get_tree_links of public class Page
If you do not want a page to be linked in such an automated manner, you have to name its directory such that it starts with the symbol defined in
public static String UNLINKED_TAG of public class Website.
Both tags should be of one character length!
There is a distinction between the name of a page (which is as yet both the name of the corresponding directory and the name of the future html-document and directories involved there) and the title of a page, which is the attribute by which it is identified in all links and which appears as the title of the webpage. These two are but for two cases the same. The title differs for the main page, where it is set to
public static String MAINPAGE_TITLE of public class Website
and for the unlinked pages, where the leading tag-character is removed both from the title and the webpage's html-filename.
The html files are styled using css. You have to give in
public static String PATH_TO_CSS of public class Website
the directory where to find it (the path imperatively ending with a slash, which is not [yet] checked!) and in
public static String CSS pf public class Website
its name (with extension, e.g. "mystyle.css"). The css file will not be used directly, but copied to an internal location and used from there.

=== Output ===

If you have called
$ java Website mydirectory/
and the program has run successfully it will have operated with a directory whose name is given by the concatenation of the directory given (mydirectory in this case) and
public static String HTML_SUFFIX of public class Website (use something like "_html"). It will have:
1. Deleted (!) this directory and created it anew
2. Placed the mainpage and all files of your website into it, giving the files as extension the one provided in
public static String HTML_EXTENSION of public class Website
(allowing to switch from ".html" to ".htm" for example, watch out for adding the dot!).
3. Placed leaves of the hierarchy tree as stand-alone html-pages, and grouped the others into directories, giving them the same name as their webpage. if you the following directories in mydirectory/
page1/
page2/
page2/page21/
page2/page22/
and an analogon to _contents.txt file in each of them, then the output in say mydirectory_html/ will be
page1.html
page2.html
page2/page21.html
page2/page22.html
The main page is always called as mentionned before by
public static String MAINPAGE_FILENAME of public class Website (often required to be "index" or similar.)
4. Copied the CSS file into (e.g. in this case) mydirectory_html/.
5. Copied all files with extensions specified in
public static String[] accepted_file_formats of public class Page (no points, so "txt", but not ".txt")
which do not carry the INTERNAL_USE_TAG into the appropriate directories, corresponding to those where they have been found. 
No checks on broken links (external or internal, to pages or files) are as yet performed.
The pages get a "Last modified" timestamp, which is read from the lastModified-property of the corresponding directory.

=== How this works ===

All of the work is done in the constructor of public class Website. It sets the path to the directory of the main page and to the future main html page and creates an instance of public class Page for the main page. The main page is asked to add all its linked pages. As this method is implemented recursively this constructs the whole class hierarchy by examining the subdirectories of the one given as the directory of the main page. After deleting and creating the directory for the html pages the main page is asked to produce its html code. This method is again implemented recursively and redirects the order to all subpages.

Each page posseses a filename and a title. Both are set via the constructors
public Page(String pfilename, String ptitle)
and
public Page(String pfilename, String ptitle, Page predecessor, boolean unlinked, String lastMod).
Each page has a private String relativepath, which leads to the directory the page corresponds to. The first of the constructors is used only once for the main page, who does not have a predecessor and cannot be unlinked. There the reltivepath is set to "". In all the other cases the relativepath is the relativepath of predecessor concatenated with the filename. Although the filename variable actually stores the name of the directory, it is called filename, because it is also the name of the future html file. Each page posseses a hierarchy level, which is defined to be 0 for each page which does not have a predecessor, in especial for the main page. It is needed for example to access the css file, which is existing only once in the uppermost directory, where the main page is stored. To construct a relative link in private String get_up_directories the symbol "../" is added an appropriate number of times to the filename of the css file. The knowledge of the predecessor allows one to later construct hierarchical up-links on the webpages.
Both constructors are referring to an initialization routine private void init(), which sets private String content of the page by reading the file specified by Website.CONTENT_FILE_NAME. The relativepath leads to the directory where the program finds the source files for the page. The target path private String htmlpath is constructed in private void set_htmlpath according to what has been stated in the Output conventions section under point 3. This method is called during the initialization, and each time the status of the page is changed.

The status is defined via the following properties
private boolean leaf;
private boolean root;
private boolean unlinked;
The pages linked from the one in question are not yet known at construction time, so the property leaf is at first true, but is set to false in the instant when at the first subpage is added (which happens in private void add_linked_page(), see later). When the leaf property is changed, the htmlpath has to be changed, so never change leaf by hand, but make use of private void set_leaf(boolean val). Together with the htmlpath, which is the path to the file, a private String htmlpathdir is constructed, which is the directory in which the future html file will be in. This also happens in set_htmlpath() by cutting htmlpath from the last slash on. It does not contain the last slash. This directory is needed to place files, which are associated with the html page, in the correct subdirectory.

The subpages are organized into a private LinkedList<Page> subpages, the associated files into a private LinkedList<String> associated_files which contains just the names. If a Page is asked in public void add_linked_pages() to find all subdirectories in its own directory (stored in relativepath), each directory is added as a new subpage, if it does not carry the Website.INTERNAL_USE_TAG as first character, and the name of the found directory is given into the constructor as both the filename and the title parameter, with the slight difference, that in the title all characters which are not accepted in Html, like ä, ö, ü, ß are replaced by their Html codes. Generally quiet a lot of Html formatting is hidden in the static methods of the Html class. The corresponding method for character replacement in this case is public static String change_strange_characters() of public class Html. The lastModified-parameter of the constructor is set to the date of the last modification of the directory. A new page is not added directly, but in private void add_linked_page(Page page), which sets the leaf-flag if necessary, adds page to the subpages-collections and calls the add_linked_pages() method of the new subpage. Thus if this process is viewed as a tree traversal of the subdirectory structure, the method used is a depth-first pre-order.
If files are found and checked to be accepted, which happens in the routine public static boolean accepted_file_format, which checks both for the extension to be one of those stated in public static String[] accepted_file_formats of public class Page and the name not to carry an internal-use-tag, then their name is saved in the associated_files collection.

For the creation of the html output the rules of the Output conventions section, point 3 are demanding to create a new directory for each page, which is not a root and not a leaf. If the method public void produce_html() of public class Page is called, such a directory is created if necessary and the private String tohtml() method is called, whose output is written as the content of the html file. Whereas the produce_html-method is organizing the creation of the infrastructure, the tohtml()-method corresponds to the typical toString()-methods encountered in many classes. It makes excessive use of the formatting procedures in public class Html to write the head and the body of the html files. The body consists as yet of the title, the content, the links to adjacent pages in the tree and a timestamp. The root does not have a timestamp. The links to the hierarchical up and down, or equal-level sister pages are provided from private String get_tree_links() which uses its knowledge of the predecessor, the subpages of the predecessor (of which the current one is printed, but not linked) and of the subpages of the page in question. The format of a link to a page is set in public static String link_to_page(Page to, Page from) of public class Html. It uses the public String htmlpath_from(Page from) method of public class Path to get a relative path from the from-parameter to the to-parameter. This method cuts the htmlpath and adds a suitable number of "../" Strings. As the visible part of the link the title of the page is set. The (possibly many) links to the subpages make use of a separator which is defined as a parameter to the function public String get_links_to_subpages(Page donotlinkme, String separator). The donotlinkme page is the one to be placed where it belongs to, but as pure text, not as link. When in the produce_html method the page has written its content, the associated files are copied to the same directory where the page has been written to, and all the subpages are called to write themselves. Here it is crucial, that the directory structure is built sufficiently deep before the subpages are asked to write themselves.