Write your HTML correctly

Home » Unlabelled » Write your HTML correctly

Write your HTML correctly

Posted by Ali Hassan Thursday, February 28, 2013 0 comments

First approach

Introduction

I always hated books full of misspellings. I was not still in college that I was correcting the mistakes in French. This always bothered me, sometimes to the point of making me abandon my reading.
On the other hand, there is the experimental literature: a poem without punctuation (Guillaume Apollinaire, Le Pont Mirabeau, Alcohols, 1913), a novel that does not contain a single letter "e" (Georges Perec, Loss, 1969 ), etc.. These methods of writing are not intended to be played daily. They are rare. (1)
Corollary: French grammar is governed by rules which we are accustomed. It is generally confusing to be faced with a situation that does not respect them. (2)
Following the same principle, code HTML of a Web page must be clean. Code is full of errors indicating that the webmaster takes care of little details. I do not know how Google verifies the code of a page (I do not know if they also do) but I know they use bots. For myself writing bots several times, I know how frustrating it is having to overcome the errors or differences in conventions. Take for example a simple HTML link <a href=""> </ a>: quotes may be replaced by apostrophes or be omitted. Depending on the browser you use, it is likely that using a quote and an apostrophe in the same tag can also give a satisfactory result. This flexibility contributes to the ease of creating Web pages, but also the complexity of tasks: a crawler will handle all situations ...
As you write syntactically correct code, beware its semantics. Yes, the HTML has semantic try to use HTML tags wisely. For example, when you use the <table> the bot expects to find data inside, as opposed to "paragraphs of text." If you simply want to divide your page, the tag is appropriate div.
HTML document structure reflects the organization of its contents. It is silly to say but that is rarely observed.

Throughout this course, I will use class names, files etc.. having a syntax. This is the subject of another tutorial, I will confine myself here to summarize: in English, and only tiny letters and the hyphen.

Example

In short, it is recommended to follow the web standards. Do not use tags against their nature. I've seen people try to scroll the text color in an input tag type = "text": it is possible but is it desirable? The input tag is reserved exclusively to information from the user. Do not try to use it to produce visual effects ... Your favorite website should be w3.org.
Here are two examples each using a regular expression to extract links from a web page. The first example retrieves the links properly trained (not with any option), while the second attempts to retrieve the links (not with any option), they are well or poorly trained.

Get simple links

Select

 <?php $ regex = ' #<a   href="([^"]*)">(.*)</a>#Usi ' ; $ html = file_get_contents ( ' http://xyz.com/ ' ) ; if ( preg_match_all ( $ regex , $ html , $ matches , PREG_SET_ORDER)) { echo ' <pre> ' ; print_r ( $ matches ) ; echo ' </pre> ' ; } ?>

Get links misspelled

Select

 <?php $ regex = ' #<a   href=["\']?([^>]*)["\']?>(.*)</a>#Usi ' ; $ html = file_get_contents ( ' http://xyz.com/ ' ) ; if ( preg_match_all ( $ regex , $ html , $ matches , PREG_SET_ORDER)) { echo ' <pre> ' ; print_r ( $ matches ) ; echo ' </pre> ' ; } ?>

The regular expression is obviously more complex for the second example.

This second example is incomplete. To extract information correctly links poorly written, I do not really workable solution. Should perhaps remove the entire link, then go through character by character, trying to determine the beginning and end of each parameter. I imagine it would be sufficient to look at the source code of a Web browser for a reliable solution, but it is not our goal here.

I think my first goal is reached: I have just shown that it quickly becomes a headache if the HTML is not correct. It is for this reason that the search engines prefer sites with clean code.

II-B. Links: <a>

First approach

Insofar as they are the basis of ranking algorithms, links (HTML anchors) need special mention.
A link tag is composed of several attributes. Href is the most common, as it allows to give a link to the destination. In my previous examples, this is the only attribute.
In a real situation, the title attribute plays an important role when referencing: it allows the browser to display a tooltip information. This is very useful for associating keywords with a link anchor text has nothing to do. For example, the anchor text "click here" has no meaning for the search engine: he does not know what keywords to associate with the link.
A link is properly trained mainly consists of a URI (href property) does not contain exotic characters and, in the absence of significant anchor text, title (ownership title) to liaise with the keywords.
A link must always be associated with keywords (either by anchor text either by its title) when a destination URI.
It is possible to insert keywords in the destination URI to further refine the search engine will find it in the target document. This is fundamental because it allows the search engine may direct its semantic analysis of the semantic field around these keywords.

Here it is worth pointing out that a search engine performs a semantic analysis of documents it indexes.

This analysis allows several things:

Guess the semantic field of the page to determine the most appropriate keywords;
Whether the page in question focuses on a fixed or if the topics are varied;
Identify pages with similar content and those with duplicate content;
etc..

To keep the links, I offer a few examples to summarize the options available to us.

Link to an anchor that is not used as keyword: property 'title' is highly recommended

Link with an anchor used as keyword: property 'title' is optional .

Another link with an anchor used as keyword: property 'title' is useful

In the latter case, I fear that the repetition of key words (identical) is harmful. In fact, some of them are located in three places: in the href and title properties as well as the anchor text. It is possible that this is seen as the pounding of keywords, that is to say, a heavy emphasis from the webmaster. Wherever possible, you should try to avoid this situation because it is not unlikely that search engines penalize the ranking of the page.
However, the property title can insert some synonyms in certain situations, so do not neglect it.

Hotlinks

In the case of dynamic websites, we are frequently confronted with GET parameters.

Example URI with a parameter

Select

 http://xyz.com/

These addresses are a problem because the number "2" in this example does not tell us what will give us the destination page. Fortunately, the script name and each carry a GET parameter name explicitly: we can reasonably guess that this is a test of the regular expression number 2 and that this test is part of a PHP tutorial which I am the author.
All this is of importance to Google.
Consider an example against: phpBB. This is a web application written in PHP to manage a discussion between users. This forum is system among others.
This web application includes a script named "viewtopic.php" GET accepts various parameters, including "t" and "p". It is obvious that these names mean nothing to someone who has not studied the operation of the application, which is certainly not the case with Google. Here "p" means "post" (message) and "t" refers to "topic" (subject). It is not certain that a bot is able to guess the semantic context encompassing both single letters, while it was easier for my example regular expressions. I do not allow myself to say that Google is able to know that "viewtopic" is a compendium of two words is the most important "topic". It would have been better to call the script in another way, such as "view-topic.php." In contrast, the "mode" parameter can be set to "edit" what a clever bot could possibly be interpreted as: "This is an edit page, so a priori less interesting."

Therefore take heed to all the elements:

The script name is the context;
The name of a parameter and its value specify the context.

Of course, the ideal is a bot not to mess with all these details. A URI without parameters is much easier to manage.
In addition, the presence of a parameter means that the page has a potentially infinite number of different versions. A bot is just like you and me: its ability to store information is limited!
Finally, the order of the parameters in the URI changes absolutely not the behavior of the server. Chains viewtopic.php? P = 1234 & mode = edit and viewtopic.php? Mode = edit & p = 1234 are different but the Web server process in exactly the same way. However, the bot will perform additional processing to not treat twice this page if it meets both URIs.
All this suggests that it is preferable that the bot a search engine does not realize he is considering a dynamic page. This comes into play here is the technique of rewriting described a relationship in one of my classes (cf. Ties at the end of this tutorial).

Images: <img/>

The images are graphic, visual. Yes, this is obvious. Well no, actually, it is not one!
The images are not included in the source code of the HTML page. They are there in the form of HTML tags, nothing more. The bot is not able to read, analyze, extract the semantic field and compare it to the rest of the page. However, sometimes it is fundamental to the page because they improve the readability of a long text and simply explain a complex concept.
It is for this reason that we should not neglect the images. Tags allow their introduction keywords that bot not fail to interpret the words as particularly important in the context of the page. This is the contents of the alt property, which means "alternative text." Originally, this property was only intended to provide a visual element to the case where the image is not available, so as not to leave the reader wanting more. Today, it is used to associate keywords to an image and enhance the semantic field of a page.

Example of image with alternative text:

Select

 < img   src = " http://www.developpez.net/forums/images/logo16.gif "   alt = "xyz.com " >

Off topic: Do not use the border property of your images to remove the border when they are in relationships! This is not the work of HTML (document structure) but your CSS (presentation elements in your document).

How to make a CSS:

Select

 img { border : 0px ; }

Tables: <table>

Many webmasters tend to use the <table> element in any design. This is a mistake! Indeed, the semantics is correct and, moreover, this item requires a visual structure, which is not the goal of an HTML document but the style sheet applied to it (CSS). The <table> element is expected to present raw data such as a list of products (eg "+ name + quantity in stock price"), not to structure a web page.
For example, here is a very respectable use an array: my shopping list yesterday ...

N °	Amount	Designation	Cost
1	1	Mostassa Ant.	2.05
2	1	Tony Tom QP-3	1.20
3	2	S. Dindi / Poll.	3.20
4	2	Barrete. Patat.	1.20

Divisions: <div>

It is not advisable to use the <table> element structure for a Web page because there is a better solution.
Consider a text consisting of chapters and paragraphs. A single page to put it on the Internet is as follows:

Poorly structured HTML page

Select

 < font   size = " 3 " > Title the text < / font > < br   / > < br   / > < font   size = " 2 " > Chapitre 1 < / font > < br   / > Paragraph 1.1 < br   / > < br   / > Paragraph 1.2 < br   / > < br   / > Paragraph 1.3 < br   / > < br   / > < br   / > < font   size = " 2 " > Chapter 2 < / font > Paragraph 2.1 < br   / > < br   / > Paragraph 2.2

Of course, it is completely ineffective HTML for SEO since no structural information is provided to the bot, it can not find any semantics. Thus, we must divide our sections.
I just give you the important word: a division, namely <div> your HTML code.

Properly structured HTML page

Select

 < div > < h1 > Title the text < / h1 > < / div > < div > < div > < h2 > Chapter < / h2 > < p > Paragraph 1.1 < / p > < p > Paragraph 1.2 < / p > < / div > < div > < h2 > Chapter 2 < / h2 > < p > Paragraph 2.1 < / p > < p > Paragraph 2.2 < / p > < p > Paragraph 2.3 < / p > < / div > < / div >

In the example above, the structure of the document is quite explicit. We used a division for another division title and the rest of the text. The text itself is divided into two divisions, each composed of a title and paragraphs. It is possible to segment further, it depends on the taste.
In addition, an organization such as this facilitates the use of CSS.

Lists <ul>

The bulleted list is an important element of the web page because it can very quickly identify what, guess what, a list of items ...
It can be used as such as I've done repeatedly in this course or in combination with a stylesheet to build a summary. The most important thing here is to have a properly structured HTML document.
A very useful technique in certain situations is to build a list while hiding chips using CSS. This gives you total control over the appearance of your list, while maintaining the semantics of the document. This allows you to use graphics without that older browsers (which do not show these) are affected too.

Its advantages:

Consistency of the HTML source code;
Enables non-graphical browsers know that this is a list;
Also allows users to know that this is a list;
Possibility to customize the appearance of the chip using CSS.

Sample list which we do not want to display chips:

Select

 < ul   style = " list-style:   none; " > < li > Pseudo < / li > < li > E-mail < / li > < li > Site Web < / li > < / ul >

Forms: <form>

Is a great temptation to manage all user actions through links (HTML anchors). However, it may cause you problems ...
The official documentation explains very clearly that user action must be carried out exclusively in POST mode, while mode is only used GET to retrieve information for the user. This means that links can only be used to retrieve information, not to perform actions. To do this, you must use a form.
If you do not follow this standard, you expose yourself to a browser plugin that uses the session of the user to follow all site links. Of course, since the session is used, the plugin has access to all the links that are displayed in the VIP of your site and you do not dedicate the usual bots. I'm here relate an adventure that happened to people well known (cf. External links at end of article).

Generally

Organize your documents.
Use the tools provided by the HTML, do not ignore the tags. Give a title to your documents using the <title> (60 to 80 characters) and rephrase it or repeat it in your <h1> (allowing you to possibly a longer sentence).

Use the semantics of HTML:

em (emphasis): Draws attention to a word, which usually results in italics;
strong: Indicates an important word (louder that em), which usually results bolding;
quote: A quote or a reference to other sources of information;
q (quote): In the case of a citation, the author includes <cite> <q> the citation itself;
blockquote: A great quote;
dfn (definition): The word is defined in this part of the page;
Code: source code, usually represented by the browser using a monospace font;
samp (sample): Example result of executing the source code;
var: A variable;
abbr: An abbreviation;
acronym: An acronym.

Some of these elements (such as <em> and <strong>) are translated visually in some browsers. You can use a style sheet to control their appearance.
In the case of <abbr> <acronym> and it is possible to define two attributes "lang" and "title". They can give expression in its complete form and indicate the language.

Learn to tell the difference between some tags as a short hand <b> and <i> and other equivalents <strong> and <em>. The former are for orders browser: "I want it in bold (bold) and it italic" and the others are semantic information: "this word is strong (strong), it is simply to highlight ( emphasis). "

Certain elements give the impression of being replaced by stylesheets: we will return later to the distinction needs to be done.

Do not forget to include a "Microsummary" (an alternative title and dynamics) in each of your pages. The title of your page should not be changed by your updates, while it should be: it informs users of your updates. Micro summary will be separate articles: in the meantime, you will find all the useful links at end of article. Firefox 2 can use the microphone as part of its summary bookmarks.

Seo tutorials