News web portals present information, in previously defined topic taxonomy, in both multimedia as well as textual format, that cover all aspects of our daily lives. The information presented has a high refresh rate and as such offers a local as well as a global snapshot of the world. This book presents information extraction techniques and their use in categorisation schemes standardisation and automatic classification of newly published content. As the personalisation method, weighted Voronoi diagrams are proposed. The aim of the work is to create a virtual browsing profile based on the semantic value of information of visited nodes (web pages formatted with HTML language). The results can greatly contribute to the applicability of the personalisation data to specific information sources, including various web news portals.
The HTML and XML languages have a language basedoverhead. This quantitative study proposed severalencoding methods for the contents of HTML and XMLfiles. The files were parsed for similar words andsubsequently encoded with shorter characterrepresentations. A Web server and Web client weredeveloped to test the hypothesis that the encoding ofHTML and XML files using these methods prior totransmission by the server and decoding prior torendering by the client would might produce areduction in overall transmission time when comparedwith files that were not encoded. Presence of Zipcompression and conversion to Binary using HuffmanCompression were also considered. The resultssuggested that while there was a reduction in thesize of the files after encoding, no reduction intransmission time was conclusively found. The datasuggest that there is an endemic cause, and furtherexperimentation is suggested to determine find thesource.