Introduction
Note:
Some history and philosophy goes below, so if you were looking for more specific technical details, you
might want to consider going directly to Architecture or API.
What it is all about
The key SiteXML thesis is the following:
At the time when HTML was developed, the web was thought as a network of static pages, linked
together with hyperlinks, and therefore HTML was a standard to a web page. Today the Internet is thought as the network of sites, but we still do not have a standard to a website.
Internet publishing is based on the HTML standard, developed in the
1980's by Tim Berners-Lee and colleagues. Today, HTML is only an
illusion when speaking of server-side. Yes, there is HTML in your browser, when you request a page from a server, but there are no HTML pages on the server. It became
so due to historical reasons, mostly because of the browser war which took place on Earth some years after the first site was published. While browser producers
struggled to make Internet look better in their browsers, enforcing and polishing HTML and other client-related standards,
server-side developers distorted the initial simple and clear idea. Where is
that HTML's simplicity? If you look at a website source code on a server, you will be terrified how complex and difficult it is: a weird collection of files, snippets, source code; various programming languages, databases, exotic homebrew frameworks, all difficult to combine.
What have we lost, ignoring HTML principles at server side? First of all, it is simpicity, of course. Let's dig further: the fundamental principles of HTML, which is still the basis of the modern Internet by the way, were:
- focus on valuable content, rather than styling and interactivity;
- free, accessible, and simple enough to be edited by anyone who has something
to say;
- platform-independent: if you simply copy a site from Unix to Mac or PC
or whatever, and it will work! — this is what is called a standard.
Sites, made on pure HTML had both advantages and disadvantages.
Advantages:
- fast loading;
- simple and readabile code;
- easily creatable and editable with plain text
processors (mostly free or preinstalled);
- requiring no need to know programming;
- content-driven file structure;
- direct relationship between file structure and URL—easy to
locate, create, edit, and publish web-pages.
Pure HTML approach had also some disadvantages:
- poor styling;
- poor interactivity.
These disadvantages are hardly possessed by the modern approach of 'heavy' backend, but this modern approach also hardly has the old school advantages.
20 year ago we could not know what the web would be today. And
today it is the time of highly interactive pages with
rich-authoring functionality, and this is very different from the
Internet of the early 1990s. The difference is that now
the Internet is thought as the network of sites, but we
still do not have a standard to a website.
The Internet today is based on HTML as if the web still was the
network of pages, but actually, all HTML principles are dropped as if they were useless:
- Webservers only pretend to have HTML pages. Even simple
static sites do not have in fact nothing even similar to HTML
pages! Even more: it became a 'mauvais ton' to make sites using HTML
technology. How could the beautiful HTML, which is the
fundamental of the Internet, become the sign of a bad
style?!
- Simple static pages are served by too powerful site engines
(mostly scripting and because of that slow) and CMS's that
generate too much server, client, and broadband
load per page.
- Backend is absolutely dependent on platform—you cannot
just copy your site from Unix to Windows, you can even hardly
copy your site from Unix to another Unix—that will not work in
most cases!
- Sites are nearly impossible to maintain by non-programmers. Publishers depend on developers who
- choose server configuration;
- code their own functions;
- are the only (or from limited amount of people) who can control and update the site, made by them.
- It is quite difficult to create and keep track of site's components;
- We depend on intermediate tools like scripting engines, databases, modules, templates.
All this makes the task of setting up and working a website
for someone, who has something valuable to talk about, but not a programmer, a quite
uneasy thing.
Summing up, we have come to a point where we can create very interactive sites on the Internet, at the cost of the fundamental web principles. SiteXML tries to solve them, read on.
Problem Statement
This work's goal is to revise HTML principles in perspective
of the modern Internet. We want to return to pure HTML, keep it
simple, but highly interactive. Let us try to break down this general task into more discrete goals. They are:
- Suggest a standard to file structure of websites to ensure
that:
- sites are platform- and developer-independent, at least at
content level;
- different kinds of developers depend less on each other as
much as possible; their work is better reused throughout the
Internet: design, modules, rich editors. (As a consequence, this
will react in better web usability: similar functions looking
similar on different sites result in well-expected UI behavior and this is a good usability.)
- Introduce client-server interaction protocol, or STP (Site
Transfer Protocol)
- to help frontend and backend developers work more independently, thus providing more effective
Internet development by better reuse of components.
- make file structure of sites more readable, accessible and
editable:
- separate content into PURE HTML, as if a site would really be a collection of HTML pages;
- make sites, especially content, editable with both rich editors and text
processors, thus giving full control over content and easy access to its update.
- And, last but not least, keep it all as simple as pure HTML,
but as powerful, as modern technology level web sites. This is the essence of our idea.
The Principles
We spoke about what, now speak about how.
The solution to the problem must meet the following criteria, that are very much similar to HTML principles:
- Clear file structure:
- Focus on content;
- Content is pure HTML files;
- File structure reflects content structure.
- Cover 90% of site needs:
- Focus on content-oriented sites;
- Reuse repeated from-site-to-site interactivity: forums,
feedback, comments, etc.;
- Platform independent:
- Webserver-integrated—you can copy
SiteXML site to any platform and it should start working
automatically (at least at content level);
- Any modern and popular CMS should be able run SiteXML
sites;
- Any scripting back-end languages must support SiteXML in case
of obsolete servers, or because of need of specific customization;
- any browser should be able to run a SiteXML site alone
(without running back-end driven functions, of course).
- Avoid intermediate software, that make our sites more
platform-dependent:
- no databases;
- no frameworks;
- and no scripting engines when possible.
- Component independent to keep it as light and swift as possible, but infinitely flexible and extensible:
- you should be free to choose your own components to run that site:
- server-side engine, from server integrated to your favorite CMS, stand-alone scripting engine, or your custom engine;
- custom client editing tool—from none to reach editors or CMS's;
- reusable themes and modules.
- Convey good usability principles:
- Ajax-browsing—why loading whole page every time? - Every time we request a page, we should load only changeable parts: content-portions, defined in layout, e.g. banner on top, ad column on the right, main content in the middle;
- Unchangeable portions should stay untouched while browsing, there is absolutely no need to load them with every page: styles and layout, navigation, javascripts, images, etc.;
- In-place content authoring should become a standard: it should be as easy as working with modern text processors.
Server integration. Webservers should have
native support of SiteXML.
- One SiteXML engine per server, integrated into webserver;
no CMS, no site engine, or any other executable script or DB in
SiteXML site directory. There is only site-specific content in
SiteXML site directory.
- gives better maintenance
- server admin maintains only one instance of site engine,
that servers all sites on the server.
- more reliability
- sites do not depend on custom site engine or CMS
developers, programmers and different hosting environments
- less files in site directory
- better file manipulation,
- backup / recovering
- faster
- no need to load scripting engines
- site engine is compiled into web-server
- better site cashing
- easy support for hosters
- all they do is add a webserver module that supports
SiteXML, and voila! your favorite hoster company supports
SiteXML!