As well, take a look at our Other FAQs, the Comeau Templates FAQ and the the Comeau C99 FAQ
The intent of this page is to address questions about C++ and C that come up often, perhaps too often. However, it is exactly the frequency of these topics that is the reason for including a discussion of them below. These issues usually come up as having originated from a misleading statement commonly made, or from code shown in a book. These points have found themselves here as the result of our connection to the C++ and C communities for 20 years, whether teaching, helping in newsgroups, providing tech support for Comeau C++, or just plain listening to folks' issues. Some of the topics below can be found in other FAQs, however, here we try to offer more information on the respective topics, as well as issues related to them. Here's the current topics:
Note that we've broken our list down into categories, consider getting one or two from each category. Some you should get as references, others you should get to read from cover to cover.
Note that often there is a problem between technical accuracy and readability. You may have a very readable book that's telling you wrong things, or avoiding fundamentals or insights, but that's won't get you anywhere to be reading the wrong things albeit easily. The converse is a very accurate book that is not as easy to read. There is a price to pay either way, but generally you're better off with the technically correct text. We believe this is so for the short and long term. And in general, ple ase make an effort to avoid product oriented books, or books with titles that just make things sound like everything will just be so great.
Categorically, we have not been satisfied with online tutorials (this does not mean that there are no good ones, just that we have not seen it yet). If you know of one that's excellent, don't hesitate to email us about it.
| news:alt.comp.lang.learn.c-c++ |
http://www.comeaucomputing.com/learn/faq |
| news:comp.lang.c news:comp.std.c |
http://c-faq.com |
| news:comp.lang.c++ news:comp.lang.c++.moderated |
http://www.parashift.com/c++-faq-lite/ Welcome to comp.lang.c++ http://www.slack.net/~shiva/welcome.txt |
| news:comp.std.c++ | http://www.jamesd.demon.co.uk/csc/faq.html |
| The Comeau C++ and C FAQ | http://www.comeaucomputing.com/techtalk |
| This document (The Comeau C++ TEMPLATES FAQ) | http://www.comeaucomputing.com/techtalk/templates |
| The Comeau C99 FAQ | http://www.comeaucomputing.com/techtalk/c99 |
In general, if you need a FAQ, check out http://www.faqs.org
The latest revision of Standard C, so-called C99, is also available electronically too from ANSI. ANSI C89 or ISO C90 is still currently available but only in paper form. For instance, Global Engineering Documents was carrying C89 (X3.159). Once at that link, click US, enter "x3.159" in the document number (NOT the document title) search box, and that'll get you the latest info from Global on this paper document (last we checked it was US$148)
// A: main should not return a void
void main() { /* ...Whatever... */ }
// AA: This is just the same as A:
void main(void) { /* ...Whatever... */ }
The problem is that this code declares main to return a void and that's just no good for a strictly conforming program.
Neither is this:
// B: implicit int not allowed in C++ or C99
main() { /* ...Whatever... */ }
The problem with code example B is that it's declaring main to return nothing. But not declaring a function's return value is an error in C++, whether the function is main or not. In C99, the October 1999 revision to Standard C, not declaring a function's return value is also an error (in the previous version of Standard C an implicit int was assumed as the return value if a function was declared/defined without a return value). But the usual requirement of both Standard C++ and Standard C is that main should be declared to return an int. In other words, this is an acceptable form of main:
// C: Ok in C++ and C
int main(void) { /* ... */ }
Also, an empty parameter list is considered as void in C++, so this is ok in C++:
// D: Ok in C++ and C
int main(/*NOTHING HERE*/) { /* ... */ }
Note that it is ok in Standard C too, because although there is an empty parameter list, D is not a declaration but a definition. Therefore, it is not unspecified as to what the arguments are in C, and it is also considered as having declared a void argument.
When you desire to process command line arguments, main may also take this form:
// E: Ok in C++ and C
int main(int argc, char *argv[]) { /* ... */ }
Note that the names argc and argv themselves are not significant, although commonly used. Therefore F is also acceptable:
// F: Ok in C++ and C
int main(int c, char *v[]) { /* ... */ }
Similarly, array parameters "collapse" into pointer arguments, therefore, G is also acceptable:
// G: Ok in C++ and C
int main(int c, char **v) { /* ... */ }
The int return value may also be specified through a typedef, for instance:
// H: Ok in C++ and C
typedef int Blah;
Blah main() { /* ... */ }
Here, main is declared to return an int, since Blah is defined as an int. This might be used if you have system-wide typedefs in your shop. Of course, the following is also allowed since BLAH is text substituted by the preprocessor to be int:
// I: Ok in C++ and C
#define BLAH int
BLAH main() { /* ... */ }
Do note though that the standards do not talk about all integers, but int, so you wouldn't want to do these:
// J: Not ok
unsigned int main() { /* ... */ }
// K: Not ok
long main() { /* ... */ }
Often some of this can compound. For instance, a problem some run into looks as follows, consider:
#include "SomeHeader.h"
main()
{
/* ... */
}
However, note that Someheader.h is written like this:
struct WhatEver {
/* ... */
}
/* MISSING semicolon for the struct */
That means that the main in this case is being declared to return a WhatEver. This usually results in a bunch of funny errors, at best.
In short, you wouldn't expect this to work:
// file1.c
float foo(int arg) { ... }
// file2.c
int foo(int);
But that's exactly the scenario that occurs when you misdeclare main, since a call to it is already compiled into your C or C++ "startup code".
The above said, the standards also say that main may be declared in an implementation-defined manner. In such a case, that does allow for the possibility of a diagnostic, that is, an error message, to be generated if forms other than those shown above as ok are used. For instance, a common extension is to allow for the direct processing of environment variables. Such capability is available in some OS's such as UNIX and MS-Windows. Consider:
// L: Common extension, not standard
int main(int argc, char *argv[], char *envp[])
That said, it's worth pointing out that you should perhaps favor getenv() from stdlib.h in C, or cstdlib in C++, when you want to access environment variables passed into your application. (Note this is for reading them, writing environment variables so that they are available after your application ends are tricky and OS specific.)
Last but not least, it may be argued that all this is not worth the trouble of worrying about, since it's "such a minor issue". But that fosters carelessness. It also would support letting people accumulate wrong, albeit "small", pieces of information, but there is no productive benefit to that. It's important to know what is a compiler extension or not. There's even been compilers known to generate code that crashes if the wrong definition of main is provided.
By the way, the above discussions do not consider so-called freestanding implementations, where there may not even be a main, nor extensions such as WinMain, etc. It may also be so that you don't care about whether or not your code is Standard because, oh, for instance, the code is very old, or because you are using a very old C compiler. Too, note that void main was never K&R C, because K&R C never supported the void keyword. Anyway, if you are conce rned about your code using Standard C++ or Standard C, make sure to turn warnings and strict mode on.
If you have a teacher, friend, book, online tutorial or help system that is informing you otherwise about Standard C++ or Standard C, please refer them to this web page http://www.comeaucomputing.com/techtalk. If you have non-standard code which is accepted by your compiler, you may want to double check that you've put the compiler into strict mode or ANSI-mode, and probably it will emit diagnostics when it is supposed to.
Semantically, returning from main is as if the program called exit (found in <cstdlib> in C++ and <stdlib.h> in C) with the same value that was specified in the return statement. One way to think about this is that the startup code which calls main effectively looks like this:
// ...low-level startup code provided by vendor
exit(main(count, vector));
This is ok even if you explicitly call exit from your program, which is another valid way to terminate your program, though in the case of main many prefer to return from it. Note that C (not C++) allows main to be called recursively (perhaps this is best avoided though), in which case returning will just return the appropriate value to wherever it was called from.
Also note that C++ destructors won't get run on ANY automatic objects if you call exit, nor obviously on some newd objects. So there are exceptions to the semantic equivalence I've shown above.
By the way, the values which can be used for program termination are 0 or EXIT_SUCCESS, or EXIT_FAILURE (these macro can also be found in stdlib.h in C and cstdlib in C++), representing a successful or unsuccessful program termination status respectively. The intention is for the operating system to do something with the value of the status along these same lines, representing success or not. If you specify some other value, the status is implementation-def ined.
What if your program does not call exit, or your main does not return a value? Well, first of all, if the program really is expected to end, then it should. However, if you don't code anything (and the program is not in a loop), then if the flow of execution reaches the terminating brace of main, then a return 0; is effectively executed. In other words, this program:
int main() { }
is effectively turned into this:
int main() { return 0; }
Some C++ compilers or C compilers may not yet support this (and some folks consider it bad style not to code it yourself anyway). Note that an implication of this is that your compiler may issue a diagnostic that your main is not returning a value, since it is usually declared to return an int. This is so even if you coded this:
int main() { exit(0); }
since exit is a library function, in which case you may or may not have to add a bogus return statement just to satisfy it.
An implementation-defined description follows. On UNIX, the low-order 8-bits of the int status are returned. Another process doing a wait() system call, which is not a Standard function, might be able to pick up the status. A UNIX Bourne shell script might pick it up via the shell's $? environment variable. MS-DOS and MS-Windows are similar, where the respective compilers also have functions such as wait upon which another application can obtain the status. As well, in command line batch .BAT files, you can code something like IF ERRORLEVEL..., or with some versions of Windows, the %ERRORLEVEL% environment variable. Based upon the value, the program checking it may take some action.
Do note as mentioned above, that 0, EXIT_SUCCESS and EXIT_FAILURE are the portable successful/unsuccessful values allowed by the standard. Some programs may choose to use other values, both positive and negative, but realize that if you use those values, the integrity of those values is not something that the Standard controls. In other words, exiting with other than the portable values, let's assume value's of 99 or -99, may or may not have the sa me results/intentions on every environment/OS.
#include <iostream>
int main()
{
int i = 99;
std::cout << i << '\n'; // A
std::cout << i << std::endl; // B
return 0;
}
In short, using '\n' is a request to output a newline. Using endl also requests to output a newline, but it also flushes the output stream. In other words, the latter has the same effect as (ignoring the std:: for now):
cout << i << '\n'; // C: Emit newline
cout.flush(); // Then flush directly
Or this:
cout << i << '\n' << flush; // D: use flush manipulator
In a discussion like this, it's worth pointing out that these are different too:
cout << i << '\n'; // E: with single quotes
cout << i << "\n"; // F: with double quotes
In specific, note that Es last output request is for a char, hence operator <<(ostream&, char) will be used. In Fs case, the last is a const char[2], and so operator <<(ostream&, const char *) will be used. As you can imagine, this latter function will contain a loop, which one might argue is overkill to just print out a newline. Some of these same point also apply to comparing these three lines of code, which all output h, i and newline, somewhere in some way:
cout << "hi\n"; // G
cout << "hi" << '\n'; // H
cout << "hi" << "\n"; // I
By the way, although these examples have been using cout, it does not matter which stream is being used. Also, note that line A may also cause a flush operation to occur in the case where the newline character just happens to be the character to fill up the output buffer (if there is one, that is, the stream in question may happen to not be buffered).
In conclusion, which of these should you use? Unless performance is absolutely necessary, many favor using endl, as many find that just typing endl is in most cases easy and readable.
Here's a bit more technical information for those so inclined. As it turns out, endl is called an iostreams manipulator. In reality, it is a function generated from a template, even though it appears to be an object. For instance, retaining its semantics, it might look like this:
inline ostream &std::endl(ostream& OutStream)
{
OutStream.put('\n');
OutStream.flush();
return OutStream;
}
Iostreams machinery kicks in because it has an ostream& std::operator <<(ostream &(*)(ostream&)) which will provide a match for endl. And of course, you can call endl directly. In other words, these two statements are equivalent:
endl(cout);
cout << endl;
Actually, they are not exactly the same, however their observable semantics is.
There are other standard manipulators. We leave it as an exercise to the reader to research how to create your own manipulator, or how to override something like endl.
#include <stdio.h>
int main()
{
/* Code using stdin inserted here */
fflush(stdin); // eat all chars from stdin, allegedly
/* More code using stdin */
}
But this won't work. It is undefined behavior because the standard only provides words to make sense out of fflush()ing output streams, not input streams. The reason this is so is because the stdio buffer and the operating system buffer are usually two different buffers. Furthermore, stdio may not be able to get to the OS buffer, because for instance, some OSs don't pass the buffer along until you hit return. So, normally, you'd either have to use a non-standard non-portable exte
nsion to solve this, or write a function to read until the end of the line.
In C++, you might do this:
#include <iostream>
int main()
{
char someBuffer[HopeFullyTheRightSize];
/* Code this and that'ing std::cin inserted here */
/* Now "eat" the stream till the next newline */
std::cin.getline(someBuffer, sizeof(someBuffer), '\n');
/* More code using std::cin */
}
One problem here is that an acceptable buffer size must be chosen by the user. To get around this, the user could have used a C++ string:
#include <iostream>
#include <string>
int main()
{
std::string someStringBuffer; // Don't worry about line length
/* This and that'ing with std::cin here */
std::getline(std::cin, someStringBuffer);
/* More code using std::cin */
}
A problem still remain here in that the code still requires a buffer to be explicitly provided by the user. However, iostreams has capability to handle this:
#include <iostream>
#include <limits>
int main()
{
/* Code using std::cin inserted here */
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
/* More code using std::cin */
}
It might be worth wrapping that line up into an inline function, and letting it take a std::istream & argument:
inline void eatStream(std::istream &inputStream)
{
inputStream.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
Note that often you will see std::cin.clear() used, which may look redundant given .ignore(). However, .clear() does not clear characters. Instead, it clears the respective stream's state, which is important to do sometimes. The two operations often go hand in hand because perhaps some error situation has occurred, like reading a number when alphabetic characters are in the input stream. Therefore, often clearing the stream state and the bad characters is oft en done in adjacent lines together, but be clear :), they are two different operations.
Of course, you don't have to .ignore() the max possible number of characters, but as many chars as you like, if less makes sense for the problem that you are solving. The above shows C++ solutions, but C solutions will be similar, to wit, you need to explicitly eat the extra characters too, perhaps:
#include <stdio.h>
void eatToNL(FILE * inputStream)
{
int c;
/* Eat till (& including) the next newline */
while ((c = getc(inputStream)) != EOF)
if (c == '\n')
break;
}
/* blah blah */
eatToNL(stdin);
/* blah blah */
As usual, don't hesitate to read your texts on the functionality of iostreams or stdio.
class Comeau { };
// ...
Comeau *p = new Comeau[99];
delete [] p; // ok
// ...
Comeau *p2 = new Comeau;
delete [] p2; // not ok
If you new a scalar, then you need to delete a scalar:
Comeau *p3 = new Comeau; delete p3; // ok // ... Comeau *p4 = new Comeau[99]; delete p4; // not okThe reason so is because delete doesn't just get rid of the memory, but runs the respective destructors for each element as well. This does not mean that since builtin types don't have destructors that this is ok:
int *p5 = new int[99]; // AA delete p5; // BB: NO!It does not matter if violations of these points appears to work with your compiler. It may not work on another compiler, or even an upgrade of the same compiler, since the Standard has no such provision for line BB given line AA.
#include <iostream>
int main()
{
cout << "hello, world\n";
}
it may be an error, because there is no cout, there is a std::cout. So spelling it like that is one way to fix the above code:
std::cout << "hello, world\n";You could also use a using declaration:
using std::cout; // read as: hey, cout is in now in scope // Therefore, cout is a synonym for std::cout //... std::cout << "hello, world\n"; // ok no matter what cout << "hello, world\n"; // ok because std::cout is in scopeYou could also use a using directive:
using namespace std; // hey, all of std is in scope
//...
std::cout << "hello, world\n"; // ok no matter what
cout << "hello, world\n"; // ok because std::cout is in scope
// because all of std from the headers used is in scope
Although this is not a namespaces tutorial, it's worth pointing out that you should usually consider putting your usings as local as possible. The reason why is because the more usings that you use, the more you'll be defeating namespaces. IOWs, when possible:
These points are true of the rest of the names in the standard library, whether std::vector, or whatever... so long as you've #included the right header of course:
#include <vector> #include <string> vector<string> X; // nope std::vector<std::string> Y; // okWhich brings up the issue that Standard C++ does not have a header named <iostream.h>, although many compilers support it for backwards compatibility with pre-Standard implementations, where namespace didn't exist. So, as an extension, the original example given in this section may work on some implementations (with iostream.h or iostream).
Similarly, the Standard C headers, such as <stdio.h>, are supported in C++, but they are deprecated. Because .h forms of C headers are deprecated, so-called Cname headers are often said to be preferred, for instance, cstdio, or cctype instead of ctype.h Furthermore, names in the .h are assumed to be in the global namespace, and so therefore do not need to be qualified with std::. The .h form is sometimes used as a transition model for backwards compatibility. Or, for a "co-ed" source file able to be compiled by a C or C++ compiler. This said, there is some controversy about whether these headers should have ever been deprecated, so, IMO, the jury is still out on whether you must in all cases prefer Cname's to name.h's, or for that matter, in any cases.
// x.cpp
static bool flag = false; // AAA
void foo() { if (flag)... }
void bar() { ...flag = true... }
should instead often be composed this way in C++:
// x.cpp
namespace /* NOTHING HERE!! */ { // BBB
bool flag = false; // no need for static here
}
The use of static in AAA indicates that flag has internal linkage. This means that flag is local to its translation unit (that is, effectively it is only known by its name in some source file, in this case x.cpp). This means that flag can't be used by another translation unit (by its name at least). The goal is to have less global/cross-file name pollution in your programs while at the same time achieving some level of encapsulation. Such a goal is usually considered admirable and so therefore is often considered desirable (note that the goal, not the code, is being discussed in this sentence).
Contrast this to BBB. In the case of using the unnamed namespace above, flag has external linkage, yet it is effectively local to the translation unit. It is effectively still local because although we did not give the namespace a name, the compiler generated a unique name for it. In effect, the compiler changes BBB into this:
// Just get UNIQUE established
namespace UNIQUE { } // CCC
// Bring UNIQUE into this translation unit
using namespace UNIQUE;
// Now define UNIQUEs members
namespace UNIQUE {
bool flag = false; // As Before
}
For each translation unit, a uniquely generated identifier name for UNIQUE somehow gets synthesized by the compiler, with the effect that no other translation unit can see names from an unnamed namespace, hence making it local even though the name may have external linkage.
Therefore, although flag in CCC has external linkage, its real name is UNIQUE::flag, but since UNIQUE is only known to x.cpp, it's effectively local to x.cpp and is therefore not known to any other translation unit.
Ok, so far, most of the discussion has been about how the two provide local names, but what are the differences? And why was static deprecated and the unnamed namespace considered superior?
First, if nothing else, static means many different things in C++ and reducing one such use is considered a step in the right direction by some people. Second to consider is that names in unnamed namespaces may have external linkage whereas with static a name must have internal linkage. In other words, although there is a syntactic transformation shown above between AAA and BBB, the two are not exactly equal (the one between BBB and CCC is equal).
Most books and usenet posts usually leave you off about right here. No problem with that per se, as the above info is not to be tossed out the window. However, you can't help but keep wondering what the BIG deal some people make about unnamed namespaces are. Some folks might even argue that they make your code less readable.
What's significant though is that some template arguments cannot be names with internal linkage, instead some require names with external linkage. Remember, the types of the arguments to templates become part of the instantiation type, but names with internal linkage aren't available to other translation units. A good rule of thumb to consider (said rather loosely) is that external names shouldn't depend upon names with less linkage (definitely not of those with no linkage, and often not even w ith names of internal linkage). And so it follows from the above that instantiating such a template with a static such as from AAA just isn't going to work. This is all similar to why these won't work:
template <const int& T> struct xyz { };
int c = 1;
xyz<c> y; // ok
static int sc = 1; // This is the kicker-out'er above
xyz<sc> y2; // not ok
template <char *p> struct abc { };
char comeau[] = "Comeau C++";
abc<comeau> co; // ok
abc<"Comeau C++"> co2; // not ok
template <typename T> struct qaz { };
void foo()
{
char buf[] = "local";
abc<buf> lb; // not ok
static char buf2[] = "local";
abc<buf2> lb2; // not ok
struct qwerty {};
qaz<qwerty> dq; // not ok
}
Last but not least, static and unnamed namespaces are not the same because static is deficient as a name constrainer. Sure, a C programmer might use it for flag above, but what do you do when you want to layer or just encapsulate say a class, template, enum, or even another namespace? ...for that you need namespaces. It might even be argued that you should wrap all your files in an unnamed namespace (all the file's functions, classes, whatever) and then only pull out the parts other files should know about.
Draw your attention to that that none of the above is equal to this:
// x.cpp
namespace { // DDD
static bool flag = false;
}
(The point of showing DDD is that you really wouldn't want to say it.
I guess one could say that it is redundant, and really just brings all the above issues right back (as in this flavor, it's not external). So, it's only shown to make sure you see that none of the previous versions look like this :).
Note that namespaces containing extern "C" declarations are in some ways as if they were not declared in the namespace, so since an unnamed namespace is a namespace, this holds true for an unnamed namespace as well.
Note also that the above discussion does not apply to the other uses of static: static lifetime, static members or to static locals in a function.
void foo(int /*no name here*/)
{
// code for foo
}
Take note that although an int argument would be passed, it is not named. Why would you do that? One reason is to "stub out" a routine. For instance, let's say that some functionality was removed from an already existing program. Instead of finding all calls of foo and removing them, they can be left in. The effect is to no-op the code. If the function is inline defined, it wouldn't even generate any code.
Of course, this doesn't depend upon functions with just one argument. For instance, you might have:
void bar(int arg1, int arg2, int arg3)
{
// code for bar, using arg1, arg2 and arg3
}
Now let's say that the functionality of the program changes and that
arg2 is no longer needed. Well, obviously you'll remove the code that uses arg2. But now the problem is that you'll probably get an "unused identifier" warning from your compiler. To get rid of the warning you could give it a dummy value or use within the function, but that'll just confuse the issue. Instead, you can just remove the argument name too:
void bar(int arg1, int /* Now unnamed */, int arg3)
{
// code for bar, using arg1 and arg3
}
Sometimes though, the above approach is not just used to support legacy code, but also to make sure an overloaded function gets picked, perhaps a constructor. In other words, passing an additional argument just to make sure a certain function gets picked.
As well, during code development it might help to use an unnamed argument if, for instance, you write stubs for some routines.
When possible, it probably should be argued that unused parameters should be removed completely both from the function and from all the call points though, unless your specifically trying to overload operator new or something like that.
Also, note that the above discussion has no relation to code such as:
void function(int arg, ...);which uses to the ellipsis notation (the dot dot dot) to specify that the arguments to a function are unspecified. These are unnamed too, but it establishes variable arguments to a function (which C supports too), and that's something else altogether.
As you may be aware, Standard C and Standard C++ each support a rich myriad of implicit conversions. Generally, they allow us to "manipulate" common values and put them into an object of similar, but not exactly the same, type. And they happen by default, hence why they are implicit. IOWs, as objects of some types more naturally convert to objects of other types, the language provides rules allowing some of these conversions w/o needing to specify extra code or directives (it also has rules prohibiting other conversions). Therefore, this allows compilers to generate code to do these conversions automatically.
A classic example of this is stdio's getchar(). That is, although most code will be using the return value of getchar() as a char of some sort, it actually returns an int. That means that this code may have a problem:
#include <stdio.h>
int main()
{
char c;
/* ... */
c = getchar();
}
because normally you'll probably be reading in characters in a loop and you would also want to be checking to see if there was a problem with input, for instance, an end of file condition:
while ((c = getchar()) != EOF) putchar(c);But that would require:
signed int c;Why? Well, Standard C says that EOF is a macro which "expands to an integer constant expression, with type int and a negative value, that is returned by several functions to indicate end-of-file, that is, no more input from a stream." Therefore, if getchar() is to be able to return all valid character input, then there must be some way to capture the negative value "error" condition represented by EOF as well. This means getchar() must return a type able to hold more than the character type can hold. (Some may question the design approach used with getchar(), but that's a topic for a different discussion.) In this case, that means an int, and a signed one (plain int is signed by default).
To connect all the dots here, the long story short is that internally getchar() is assigning a char value to an int, and then returning that. Inside getchar() (or some function it calls), that code is something along the lines of the following "Canglais" (C pseudo-code):
/* ... */ unsigned char buffer[SomeBufferSize]; unsigned char *bufferp; readBuffer(buffer); bufferp = buffer; /* ... */ int charToReturn; /* ... */ if (bufferNotEmpty) charToReturn = *bufferp++; /* int = unsigned char */ else if (DidHitEndOfFile) charToReturn = EOF; /* ... */ return charToReturn;Notice that the conversion from unsigned char to int "just happens". The rules of Standard C and Standard C++ each spell out what conversions can happen in contexts like this, and so compiler writers are able to generate the correct required executable code. A piece of code like this is ok too:
int main()
{
double d;
long l = 1234;
d = l; /* long implicitly converted to double */
}
These same rules however will let compilers diagnose a problem such as:
int i = 99; char *p; // ... p = i; // error, types are too different for implicit conversion
As you may also be aware, C and C++ also support explicit conversions, aka casts. They allow one to specify all the implicit conversions (consider these "castless conversions" if you want)), and also other ones that are not implicit. Note that does not mean you can convert anything to anything. Explicit conversions, or casts, are expressions which take the form of a so-called "C-style cast":
(T)EIn other words, a cast is a syntactic -- and hence purposeful, explicit and non-automatic -- mechanism/notation to accomplish a conversion. Normally it's for conversions that the compiler would not be doing by default, but you can also cast the default ones too if for some reason you want to make them explicit.
The type, T above, in a C style cast can be a simple type like int, a qualified (const or volatile) pointer, etc. and it is parenthesized.
The expression, E above, can be most normal expressions: additions, function calls, constants, etc. Therefore, to change the last example, I might have:
p = (int *)i; // compilesNote that I do not say "OK" but "compiles", because we are converting an int to a pointer, and we are not guaranteed by the language that that conversion can actually take place on a give implementation correctly.
Certainly my toy example use of 99 is a garbage memory location, however, an example where the address of a video card memory at location 0xFFFF0F used by a device driver may not be. This is a reason that use of casts should be approached cautiously: they may or may not even make sense. Even if it does, it has to actually work on a specific compiler and platform. And of course, portability of such constructs is often just thrown completely out the window.
NOTE: A cast is effectively a statement to the compiler that you know what you are doing and that it should shut up about any possible violations you may be making. So do make sure you know what you are doing, and why. This is important to consider in shops that do not permit warnings, because it is often too easy to insert a cast to satisfy the requirement and inadvertently rendered the code non-portable or incorrect (on other platforms, but probably even the same one). Furthermore, to their demise, newbies often also add casts to get around problems they don't understand and/or because of compiler errors, usually with no good basis to do so. A bad newbie basis, or even one from an expert, is because you are frustrated, or because you think you got the code working satisfactorily. These are very easy bugs to add but slippery once there, and painful to detect and fix.
Note that in the earlier examples we could have added casts:
charToReturn = (int)*bufferp++; /* ... */ d = (double)l;but in the context of those examples, the casts in those particular cases are strictly speaking not necessary, since as mentioned, they can already be done implicitly. In cases where the cast is exactly the same as not having one, the compiler will accept the cast but it will essentially have no effect, since the conversion will be happening anyway.
There is an argument that specifying cast in such a manner makes the line of code more self-documenting. That may be so, but, gratuitous casts can get burdensome. Furthermore, gratuitous casts becomes a code maintenance nightmare, and a trap, one which will most assuredly render many programs not only incorrect, but silently incorrect! So, be wise. Even casts which are not gratuitous should be used judiciously.
BTW, note that C allows types to be defined in casts, but C++ does not:
int main()
{
// code that has not yet declared xyz
p = (struct xyz { int i; } *)0; // C++ error: type xyz can't be defined here
return 0;
}
C-style casts are more formally known as "explicit conversions" because you code them explicitly. Note, C99 also now supports "compound literals" which seem to have a cast'y look to them, however, they are not casts. That is, compound literals bring forth lvalues and are not conversion requests per se.
Speaking of which, note that a cast in Standard C cannot yield an lvalue (some compilers have non-Standard extensions that allow it though). However, in C++ you can cast to an lvalue as a reference:
...(some_base_class&)E...C++ also allows classes:
class xyz {
// ...
xyz(int); // constructor taking an int
}
xyz anXyz(99); // init anXyz with 99
where we can bring forth instances of user-defined types through constructors. This same kind of possibility has been allowed for built-in types too:
int i(99);This has led to the so-called constructor style initializer in C++. And hand in hand with this, but in contrast to the C style cast form (T)E, is constructor style casts of the form:
T(E)for instance:
// ... char somechar; typedef int int_typedef_example; // ... x = this - that + (int)somechar; // C style form x = this - that + int(somechar); // C++ ctor style form x = this - that + (int)(somechar); // C style form with parens x = this - that + int_typedef_example(somechar); // ctor style using typedef as typeNote that in C++, casts (all casts) may result in class specific "operator functions" being called. Note also that the type in a constructor style cast has to take place as one token, so this is not allowed:
q = int *(p);You would need a typedef for that.
As mentioned earlier, casts are a brute force conversion mechanism. As such, they are probably too powerful, and therein lies a problem: It cannot always be grokked by looking at a cast what exactly the intention of the conversion is. Consider:
const T1 *p1; //... p2 = (T2)p1; // converting T1* to T2? const to non-const? typo error?Therefore, in order to make such code -- at least the casts deemed necessary to remain in the code -- more self-documenting, Standard C++ supports additional forms of casts often referred to as "new style casts". They are split into distinct categories for an opportunity to:
// ...
struct base {};
struct derived : base {};
base b;
derived d;
// Normal implicit "upcast" derived to base conversion
base *bp = &d; // similar to = static_cast<base *>(&d)
// here we do what is normally considered a "conversion in the wrong direction":
derived *dp1 = bp; // error: base * can't init a derived *
derived *dp2 = static_cast<derived *>(bp); // ok as far as cast goes (static class navigation)
// IOWs, may not be a base, but see dynamic_cast
// below to "properly" navigate & check hierarchies
// Can use references as well as ptrs
derived &dr = static_cast<derived &>(*bp); // ok as far as cast goes
// As with C style casts, some of these are redundant (implicit),
// may narrow, widen, etc. and are shown for exposition purposes only
charToReturn = static_cast<int>(*bufferp++); // char to int
d = static_cast<double>(l); // long to double
l = static_cast<long>(d); // and back
x = this - that + static_cast<int>(somechar); // char to int
void *SomeFunc();
void *p = SomeFunc(); // perhaps malloc()
MyType *myp = static_cast<MyType *>(p); // void * to MyType *
int SomeInt = 99;
int *pi = &SomeInt;
// ...
void *vp = pi; // castless conversion
// ... say pi = 0;
pi = vp; // error in C++
pi = static_cast<int *>(vp); // put it back
enum colors { red, white, blue };
int rwb = static_cast<int>(white); // enum to int
colors bwr = static_cast<colors>(rwb); // and back
// see some other static_cast's below
void foo(const int *cip)
{
int *ip = const_cast<int *>(cip); // de-const
// ... modifying *ip might still be undefined behavior
}
// from previous example using C style cast p = reinterpret_cast<int *>(i); // int may not look the same as int *A piece of code like this:
int main()
{
int *ip;
float *fp;
ip = fp;
fp = ip;
ip = static_cast<int *>(fp);
fp = static_cast<float *>(ip);
}
might produce many errors as they are implemented-defined issues:
Comeau C/C++ for MS_WINDOWS_x86
Copyright 1988-2005 Comeau Computing. All rights reserved.
MODE:strict errors C++
"pcast.cpp", line 6: error: a value of type "float *" cannot be assigned to an
entity of type "int *"
ip = fp;
^
"pcast.cpp", line 7: error: a value of type "int *" cannot be assigned to an
entity of type "float *"
fp = ip;
^
"pcast.cpp", line 9: error: invalid type conversion
ip = static_cast<int *>(fp);
^
"pcast.cpp", line 10: error: invalid type conversion
fp = static_cast<float *>(ip);
^
4 errors detected in the compilation of "pcast.cpp".
Assuming you really want to do such a cast,
a resolution using new style casts would be:
ip = reinterpret_cast<int *>(fp);
fp = reinterpret_cast<float *>(ip); // fp may or may not have its original value
Assuming it make sense to say one cast is more portable than another, don't count on it with reinterpret_cast, as you are really asking for type checking to be thrown out the window here.
Unlike the static_cast inheritance example earlier, be careful as reinterpret_cast does not require all types to be complete in some contexts:
struct B;
struct D; // should be inherited from B!
int main()
{
B *pb;
D *pd;
// ...
pb = pd; // error: can't assign D* to B* implicitly
pb = static_cast<B*>(pd); // error: still can't assign D* to B*,
// they are incomplete so no relationship established
pb = reinterpret_cast<B*>(pd); // compiles: but still no known relationship: Yikes!
Here's another example with reinterpret_cast, which may have come from somebody trying to say write a debugger and "normalize" some pointers for some reason. Naturally this kind of stuff would be platform specific, and hence may have portability issues:
void foo() { }
int bar(int) { return 99; }
struct wacko {
void mfoo() { }
int mbar(int) { return 99; }
};
int main()
{
void (*vpv)() = foo;
int (*ipi)(int) = bar;
vpv = bar; // error: void (*)() can't hold int(*)(int)
vpv = (void (*)())bar; // force it
vpv = static_cast<void (*)()>(ipi); // force this too? No: error: not related types
vpv = reinterpret_cast<void (*)()>(ipi); // ok: unrelated types, but force anyway
ipi = reinterpret_cast<int (*)(int)>(vpv); // usually copies back same
void (wacko::*mvpv)() = &wacko::mfoo;
int (wacko::*mipi)(int) = &wacko::mbar;
mvpv = reinterpret_cast<void (wacko::*)()>(mipi); // compiles but does it do what you want?
return 0;
}
The static_cast examples given earlier pertaining to inheritance usually apply to derived pointers or references being converted to base pointers or references respectively. Here's a classic OO example:
// shapes.h
class Shape { // ABC: abstract base class
// ...
public:
virtual void draw() = 0; // pure virtual function
virtual ~Shape();
};
class Circle : public Shape {
public:
// ... ctor with args, etc.
void draw() { /* ... */ } // virtuals inherit as virtual
};
class Square : public Shape {
public:
// ... ctor with args, etc.
void draw() { /* ... */ }
};
// Your Shapes Here....
Since some visual representations of such hierarchies are normally drawn with the base class on top and the derived classes leaves as fingers pointing downward (another way to look at it is that the base class is the root class and roots normally grow downward), then to go from a derived to a base is often spoken of as upcasting since you will be casting up such a visual representation of the inheritance diagram.
Upcasting is normally done implicitly, as that is normally the direction you convert your pointers and references when an inheritance hierarchy is involved:
#include <shapes.h>
int main()
{
Shape *sp;
sp = new Circle; // implicit upcast from Circle * to Shape *
// ...
sp = static_cast<Shape *>(new Circle); // same but unnecessarily explicit
sp->draw(); // Draw a Circle ala virtual draw()
// ...
sp = new Square; // implicit Square * to Shape *
sp->draw(); // Draw a Square ala virtual draw()
// ...
Square *sqp = static_cast<Square *>(sp); // Shape * to Square *? Ok, get Square back
Circle *cp = static_cast<Circle *>(sp); // Silent Shape * to Circle * from underlying Square *?
return 0;
}
An issue with this static_cast is that it did not consider the dynamic type of the object that was cast, it only considered the declared static type, and so only "viewed" an object "slice" so to speak. In particular, what can be made of cp? Not much (a Circle * pointing at a Square? Ugh.). So what's often needed in some cases is to be able to "go deeper." This is significant because it points to an underlying issue when we have subsystems which were not designed with each other in mind, are not extensible enough, etc. and so they are not normally able "to speak" with each other directly or purposely or optimally in some way.
As many conversions go, this subsystem stuff can be ugly. It is still desirable because you want to hook into the services of a ("3rd party") library that you have and use, whether it be for windows, graphics, databases, games, file systems, geometry, networking, whatever. However, often you have not written the library, so usually you don't want to modify it, and often you can't because often you don't even have access to the source code, among other reasons. This means that the library author does not know if you have created a derived class using "it" as one of your base classes, and obviously its objects thereof. This then means that the library will often only "produce", use, pass around, etc. what it knows about, hence only going so far as its own classes and objects (which will be your base classes and base class objects), and services, since they are provided with the library. However, if you derived from them, then you may have some specialized functionality that you have added in your derived classes for your derived class objects that the library may not know about. And so, obviously, you'll want to perform some of your own operations and services and have them work with the closed base library and your "extensions".
Such conditions, where say you can't modify the design of the library, often delve into situations that will involving casting. What is crucially desired here is the answer to "Is it safe to use the derived class object? Does it even exist?" In particular, the thrust is that dynamic_cast provides direct language support by accepting a pointer or reference to a base class object (the one in the "closed library"), and respectively rendering (converting) it as a pointer or reference to a particular derived class (yours), all at runtime. Note that the cast sought is in the opposite direction from earlier.
This base to derived conversion is downcasting, as it is casting down the inheritance diagram. Downcasting behavior does not happen implicitly. With upcasting you are normally zeroing in on a specific ancestral base class, usually quite clearly, even considering multiple inheritance. However, with downcasting, since it fans out, the breadth of the choices expands unlimitedly, and worse, the classes become less general and more specific since this is how derived classes for different niches work and are often for. For instance, we can keep adding derived Shapes to the classic example given above. This should "just work" and normally should "just interface" with the subsystem (of course, one must override virtuals, etc.).
And yet there is a problem with the cp pointer in the shape example earlier: you cannot always use static_cast to traverse a hierarchy back and forth as it does not always work this way safely. This is similar to the classic problem of:
// ...
int *pi;
float *pf;
// ...
void *pv = pi; // send to pointer to void.
// IOWs, toss type baggage out the window.
pi = (int *)pv; // and back to pointer to int
// ...
pf = (float *)pv; // but back to point to float? Say what?
but with it's own issues: static_cast uses the static type of an object, not its dynamic type, whereas dynamic_cast "looks into" the object. There is a difference between a "plain ol" pointer conversion, and object interrogation. This difference, which is reflected in the difference between static_cast and dynamic_cast, provides the safety that is sought in this problem domain:
Square *sqp = dynamic_cast<Square *>(sp); // Shape * to Square *? Ok, get Square back Circle *cp = dynamic_cast<Circle *>(sp); // Shape * to Circle * from underlying Square *? NOPE
Important here is that dynamic_cast will render the request if the derived class object really is the overlying object mentioned by the base class pointer or reference. As with most upcasting, this downcasting also begs a polymorphic (involving virtual functions -- at least a virtual destructor in a base class) inheritance relationship; it's important that the dynamic type of the pointer or reference can be picked up and used properly (that's why static_casting in the cp example is not useful). In other words, there is a check that occurs. That is, as dynamic_cast only applies to objects of polymorphic classes, then a request outside the realm of an inheritance relationship among polymorphic classes will simply and naturally elicit a compiler diagnostic:
$ cat ccbase.c
class CCbase { }; // NO VIRTUALS
class CCderived : public CCbase { };
class SomeOtherClass { };
int main()
{
CCbase *b = new CCderived;
CCderived *d = dynamic_cast<CCderived *>(b); // related but not polymorphically
SomeOtherClass *p = new SomeOtherClass;
d = dynamic_cast<CCderived *>(p); // not same inheritance relationship
return 0;
}
$ como ccbase.c
Comeau C/C++ 4.3.4.1 (Oct 30 2005 22:29:44) for MAC_OS_X
Copyright 1988-2005 Comeau Computing. All rights reserved.
MODE:strict errors C++
"ccbase.c", line 9: error: the operand of a runtime dynamic_cast must have a
polymorphic class type
CCderived *d = dynamic_cast<CCderived *>(b); // related but not polymorphically
^
"ccbase.c", line 12: error: the operand of a runtime dynamic_cast must have a
polymorphic class type
d = dynamic_cast<CCderived *>(p); // not same inheritance relationship
^
2 errors detected in the compilation of "ccbase.c".
So we can even get compiler detection in addition to the runtime checks.
If during runtime, the downcast pointer conversion request finds the specific base to derived relationship not to be the case, it return a null pointer. In the non-failure case, the correct polymorphic inheritance relationship has been "verified" hence the cast correctly returns a pointer or reference, respectively, to the derived class object, therefore you can perform the necessary derived class operations upon that result.
In other words, in the examples given, the host parenthesized expression pointer sp is not just converted to the bracketed target pointer Square * or Circle *. Instead the object being pointed to (*sp, not sp the pointer itself) "is queried" (it's implementation defined exactly how) as to whether or not it is a Square or Circle (or a class object derived from the Square or Circle class), and only if that is the case is it successfully converted. Let's play some:
$ cat shapes2.c
#include "shapes.h"
#include <iostream>
int main()
{
Shape *sp;
sp = new Circle; // implicit upcast from Circle * to Shape *
std::cout << "sp=" << sp << std::endl;
Square *sqp = dynamic_cast<Square *>(sp); // Attempt downcast
Circle *cp = dynamic_cast<Circle *>(sp); // Attempt downcast
std::cout << "sqp=" << sqp << std::endl;
std::cout << "cp=" << cp << std::endl;
return 0;
}
$ como shapes2.c
Comeau C/C++ 4.3.4.1 (Oct 30 2005 22:29:44) for MAC_OS_X
Copyright 1988-2005 Comeau Computing. All rights reserved.
MODE:strict errors C++
$ a.out
sp=4198976 Address of Circle object
sqp=0 FAILED: sp points to a Circle, not a Square, hence null pointer
cp=4198976 Okey dokey: Got back original address
The discussion thus far has focused on pointers to bases and pointers to deriveds, but a similar rendering request applies to references. Here though, a failure to convert does not result in the null pointer. Instead, as a reference should not be null, then a failure to convert a reference to a base into a reference to a derived in the same polymorphic inheritance relationship hierarchy will result in the std::bad_cast exception being thrown at runtime. So the following program will abort:
$ cat shapes3.c
#include "shapes.h"
#include <iostream>
void foo(Shape &s)
{
Circle &c = dynamic_cast<Circle &>(s); // Attempt downcast
}
int main()
{
Circle c;
Square sq;
std::cout << "foo()ing Circle" << std::endl;
foo(c);
std::cout << "foo()ing Square" << std::endl;
foo(sq);
std::cout << "Done" << std::endl;
return 0;
}
$ como shapes3.c
Comeau C/C++ 4.3.4.1 (Oct 30 2005 22:29:44) for MAC_OS_X
Copyright 1988-2005 Comeau Computing. All rights reserved.
MODE:strict errors C++
$ a.out
foo()ing Circle
foo()ing Square
C++ runtime abort: terminate() called by the exception handling mechanism
Abort trap
This program never emits "Done" as passing a Square to foo() caused dynamic_cast to throw a bad_cast since Squares and Circles are siblings. Since we don't catch it, the program by default eventually calls terminate() which eventually by default calls abort() as prescribed by the Standard C++ exception handling mechanism. If you did want to catch it, then just add try/catch:
$ cat shapes4.c
#include "shapes.h"
#include <iostream>
#include <typeinfo>
void foo(Shape &s)
{
Circle &c = dynamic_cast<Circle &>(s); // Attempt downcast
}
int main()
{
Circle c;
Square sq;
try {
std::cout << "foo()ing Circle" << std::endl;
foo(c);
std::cout << "foo()ing Square" << std::endl;
foo(sq);
std::cout << "Done" << std::endl;
}
catch (std::bad_cast) {
std::cerr << "Caught bad_cast" << std::endl;
// Do something appropriate here
}
return 0;
}
$ como shapes4.c
Comeau C/C++ 4.3.4.1 (Oct 30 2005 22:29:44) for MAC_OS_X
Copyright 1988-2005 Comeau Computing. All rights reserved.
MODE:strict errors C++
$ a.out
foo()ing Circle
foo()ing Square
Caught bad_cast
Note the #include of <typeinfo> (not <exception>) to obtain bad_cast.
It is tempting to use this infrastructure to orchestrate your program like some giant selection process using switch or if this-or-that-or-that. That is not necessarily the intent. Instead, when dynamic_cast was accepted for adoption into Standard C++, there was also another feature accepted at the same time: condition declarations. You may already be familiar with the idea with say a for statement:
int i; // line A for (i = 0; i < 99; i++) blah(i); j = i; // ok, use i from line A, it's still in scopewhich may in some cases be preferred to be written as the following, though with different behavior:
for (int i = 0; i < 99; i++) blah(i); j = i; // error, unless some other i is in scopeNote the declaration is inside the first clause of the for. The behavior difference is that now i only has scope within the for statement, and if you really need it afterwards (and sometimes you do) then it needs to be declared outside the loop the way the first example did. Well, this can also now be done with whiles, switches and ifs. For instance:
if (int *p = Somelist.next()) {
// If Somelist is empty we don't need to be here!
// This p is only in scope here, or in a corresponding else
}
// No p from the if statement here, whether by id name or address
Note that this declares, defines, and initializes p.
Then, p is tested to see if it a null pointer or not.
This is often just what you want. This provides a basis of locality, it builds on C++'s capability of allowing declarations to be nearer to their use and initialization, especially when linked to the test. This creates a scoped identifier, but only where and when needed. In this example, we only need p if there is another node still left in the list, otherwise we probably don't care. So something like this doesn't help in this or other circumstances:
if (int *p2) { ... }
This line of thinking, and behavior, is usually exactly what's needed in the dynamic_cast situation. Putting "2 and 2" together provides us a purposely limited and succinct scope:
if (Circle *cp = dynamic_cast<Circle *>(sp)) {
// downcast was successful, we are where we need to be
// Now can use services not supported by the core library
}
If it's not obvious yet, a prime aspect of doing a dynamic_cast is to be sure you end up with the right object, and so for that you need to be sure it worked.
Some may argue that a static_cast is more efficient than a dynamic_cast but when supporting code is added, any improvement, if there even was one, is probably a wash. So unless there is some super-duper compelling reason to not use the most appropriate feature, then, well, use it! Also, to counter the argument, it can also be claimed there are cases where dynamic_cast and other RTTI features can actually improve performance by allowing you to directly handle some otherwise inefficient case. As with all efficiency concerns, don't hallucinate them, and be sure they actually exist and are a problem to be solved.
Of course, on the other hand, library use is taxing in various ways. But the combination of these behaviors seems to achieve a reasonable balance between type safety and "getting along" in one's work, hence alleviating some aspects of the concern when using closed library systems. But also of course: when possible to use a cleaner design, by all means, do consider it. That is to say, if you have access to all the source code, then you may want to consider if there is a better way be integrate your libraries with different base class behavior, virtuals, etc. so as to avoid dynamic_cast altogether. They should usually always be considered over RTTI. Newbies please note this! It's easy to pound out if statements that are without much thought and that are unmaintainable, than to properly think through a design. But do think it through! This is only a feature to use when control of the class design and integrated automatic polymorphic behavior is out of reach.
Lastly, I've been speaking of things such as proper class design, or limited closed systems, but in fact, in many cases the proper design is the functionality already provided. That is, extending subsystems for corner or specialized cases is not clean either. So be very careful here: design is not always black and white. Therefore, don't force-fit hierarchies "just because" or to always avoid dynamic_cast and then end up with almost dead code for the "but-we-must-handle-this-case-code" (sic) . What will you do when the next twist arises?
There are other parts to C++'s RTTI:
NOTES:
short *sp = reinterpret_cast<short *>(cip); // error, const would be implicitly tossed out first short *sp = reinterpret_cast<short *>(const_cast<int *>(cip)); // ok, explicitly deconst
$ cat shapes5.c
#include "shapes.h"
#include <iostream>
int main()
{
Shape *sp;
sp = new Circle; // implicit upcast from Circle * to Shape *
std::cout << "sp=" << sp << std::endl;
Circle *cp = dynamic_cast<Circle *>(sp); // Attempt downcast
std::cout << "cp=" << cp << std::endl;
sp = dynamic_cast<Shape *>(cp); // And go back???
std::cout << "sp=" << sp << std::endl;
return 0;
}
$ como shapes5.c
Comeau C/C++ 4.3.4.1 (Oct 30 2005 22:29:44) for MAC_OS_X
Copyright 1988-2005 Comeau Computing. All rights reserved.
MODE:strict errors C++
$ a.out
sp=4198976
cp=4198976
sp=4198976
Note that you can sometimes go back the other direction. In this example, dynamic_casting from cp to sp is really not what you wanted to do. Be sure that is the conversion request that you really do want to make and that it is in the right direction (note it models assignment with the target type on the left of the host type)!
// a.h:
// ...
#include "xyz.h"
// ...
// b.h:
// ...
#include "xyz.h"
// ...
// xyz.h:
class xyz { };
Now, if you were to use some of these headers:
// main.c #include "a.h" #include "b.h" // ...then xyz.h will inadvertently be brought in twice, once by a.h and once by b.h. This is a problem because then xyz will end up being defined twice in the same translation unit, and it is an error to define the body of a class twice like that. This situation is by no means unique to just classes though, and is as true for duplicate definitions of structs, enums, inline functions, etc.
To get around this, in both C++ and in C, a common code technique is to sandwich the header file's source code with an #ifndef preprocessor directive. Consider a revised xyz.h:
// xyz.h:
#ifndef XYZ_H
#define XYZ_H
class xyz { };
#endif
With this revision, if xyz.h is #included, and the XYZ_H macro is not yet defined, then it will #define it, and then it will inject the rest of the header file's code into the translation unit, in this case, the code is the definition of class xyz. However, if XYZ_H has already been macro defined, then the preprocessor will skip to the #endif, and in effect will not inject, or re-inject, class xyz into the translation unit. This tech
nique establishes the notion of include guards.
Look back at main.c. We see that xyz.h will be included through a.h and since XYZ_H is not yet defined, it will define it and also let the compiler see the class xyz definition. Now, when it moves to b.h, it will open xyz.h but since it was already brought in by a.h, then XYZ_H is already macro defined. Therefore, it will skip right down to the #endif and the preprocessor will close b.h.
In effect, the second #include of xyz.h was skipped. Do note though that it did need to open it, process it only to see that it didn't need to do anything with it, and then close it.
As shown above, the traditional naming of the macro is just to follow suite with _H as a prefix to the file name. Do try to make the name unique, or maybe even long'ish in some cases, in order to avoid name clashes with it though. As well, avoid a name that will begin with an underscore, as most of those names are reserved for compiler vendors.
Given all this, it's easy to consider that you can just start including headers all over the place without any concerns. Clearly though, this can impact compile-time, and so you want to remain judicious in the headers you include. As well, some compilers support pre-compiled headers, but this should not be an excuse to get carried away.
It should be clear by now that you would add include guards for a.h and b.h too. Lastly, there are times even when you want to do this:
// abc.h #ifndef ABC_H #define ABC_H #ifndef XYZ_H #include "xyz.h" #endif #endifHere, abc.h actually tests the macro from xyz.h before even trying to include it. This avoids it from even being opened, but only if it's already been processed.
There's an implication here then that you can "fool a header", by #defineing the respective macro for it, so that perhaps it may not end up getting processed even once. Similarly, you can #undef the respective macro once the header has been processed. As with many of the above remarks, make sure that's really what you want to do.
Using of inline involves a space/time tradeoff.
A main premise of inline is that it should be used when the overhead of calling a function is higher that the overhead of the function itself. For instance here:
foo(a, b, c);The overhead of passing the 3 arguments, setting up returning to the call location, etc. might actually be higher than the cost of what the function does with the arguments. This normally means an inline function would be a small function, for some definition of small.
Note that inline functions are typesafe over macros and maintain function semantics (unlike macros, their arguments are only evaluated once, etc.).
Note that whether the inline'ing is honored is up to the implementation.
The question arises: since inline is good, should I inline everything? No. For instance, it might be suspect to inline a function with a loop in it, even if it is small.
Also, as inline can result in an increase in the code size of your applications "program image" (because it must expand the function and its expression where it is used if the inline is honored) it may lead to program that run slower when issues such as virtual memory, paging, and thrashing come into play. This is system dependent then. But it can mean your program runs slower in some cases because you "inline'd everything."
Of course, the compiler is allowed to ignore your inline request, whether implicity inline (say the definition of a function within a class) or explicitly inline by using the inline keyword. As well, the compiler is allowed to inline functions that have not been declared inline.
Also, don't just assume your application need to blazingly fast. Many applications are I/O bound. For instance, to exaggerate the point, it doesn't matter how fast your app is if it is sitting there wating for keyboard input. You can't make it wait faster. :) Instead give consideration to profiling your application to try and understand its performance. Do this over guessing. And if its shown to delve into an area involving a particular function, be sure that the actual problem is the function and not something else. Also, don't forget that another algorithm may be what the real resolution should be.
Also, remember than even if a small function, doing something like inline'ing a virtual function may have no effect, because probably it won't be called directly often.
Note that normally inline does not disturb normal function semantics. There is s case where it matters though. Often, functions are declared in header files but defined in source files. And then the linker resolves the call across the object files. However, if a function is defined as inline in a source file, it will usually only be known as inline in that source file. Furthermore, it may not get a physical footprint in the object file corresponding to that source file. For instance, given:
// blah.h
void foo();
// blahmain.c
#include "blah.h"
int main()
{
foo();
}
// blah.c
#include "blah.h"
inline void foo() { }
Will result in a linker error:
c:\tmp>como blah.c blahmain.c C++'ing blah.c... Comeau C/C++ 4.3.8 (Sep 25 2006 11:02:23) for MS_WINDOWS_x86 Copyright 1988-2006 Comeau Computing. All rights reserved. MODE:strict errors C++ C++'ing blahmain.c... Comeau C/C++ 4.3.8 (Sep 25 2006 11:02:23) for MS_WINDOWS_x86 Copyright 1988-2006 Comeau Computing. All rights reserved. MODE:strict errors C++ blahmain.obj : error: unresolved external symbol foo() referenced in function mainas foo does not need to be presented. This means inline functions are usually defined in header files.
Earlier I mentioned that inlined functions should be small, for some definition of small. That was a cop out answer. The problem is, there is no concrete answer, since it depends upon a number of things that may be beyond your control. Does that mean you should not care? In many cases yes. Also, as compilers get smarter, many situations involving inline'ing will be able to be resolved automatically as they have in many cases involving the register keyword. That said, the technology is not there yet, and it's doubtful it will ever be perfect. Some compilers even support special force-it inlining keywords and pragma's for this and other reasons.
So, the question still begs itself: How to decide whether to make something inline or not? I will answer with some considerations that need to be decided upon and/or calculated, which may be platform dependent, etc.:
Note that a function may be inline substituted in one place and not in other places. Also, you may let it be inline'd but also take its address. This may also mean there is an inline substituted version and a static local version.
Note that inline functions must still obey the "one definition rule". So, although it may work in a given implementation, you should not be providing different function bodies that do different things in different files for the same inline function for the same program.
Be aware of functions that get called implicitly. In particular be aware of constructors and destructors as there are many contexts they may be invoked whether as arguments to functions, as return values, while new'ing, during initializations, during conversions, for creating temporaries, etc. Also, of particular concern is that if ctor/dtors are inline up and down a class hierarchy, there can be a cascade of inlineing that occurs in order to accommodate every base class subobject.
Lastly, I think there are some counter issues to be discussed so that you don't have solutions looking for problems:
struct xyz {
struct abc Abc; // AA
};
struct abc {
struct xyz Xyz; // BB
};
Unfortunately, for this to work, struct abc needs to be moved before xyz, or else, how could line AA work? But wait! That would mean xyz needs to be moved before abc making this circular. One way around this is:
struct abc; // CC
struct xyz {
struct abc* Abc; // DD
};
struct abc {
struct xyz* Xyz; // EE
};
Here, we've changed Abc and Xyz into pointers. As well, we've forward declared abc in line CC. Therefore, even though abc has still not been defined, only declared, that enough to satisfy the pointers, because there is not yet any code which is going to be dereferencing the pointers, and by the time there is, both struct will have been defined.
Of course this new design implies having to dynamically allocate the memory these pointers will point to... unless of course, the self references are through C++ references, which are possible. Here's a toy example:
struct abc; // CC
struct xyz {
struct abc& Abc; // DD
xyz();
};
struct abc {
struct xyz& Xyz; // EE
abc();
};
xyz::xyz() : Abc(*new abc) { }
abc::abc() : Xyz(*new xyz) { }
Of course, in this last example, the way the constructors are set up would establish an infinite loop, so you need to be careful when using self-referential classes. In other words, you would probably normally not do it like the example above shows, and depending upon your specifics may have to change your design.
#include <iostream>
int main()
{
const int maxsides = 99;
for (int numsides = 0; numsides < maxsides; numsides++) { // A
if (...SomeCondition...)
break;
// blah blah
}
if (numsides != maxsides) // B
std::cout << "Broke out of loop\n";
return 0;
}
However, toward the end of the standardization process, related to some other rules (about conditional declarations in general), the committee decided that the scope of the identifier should only be within the for, therefore, line B is an error accto Standard C++. Note that some compilers have a switch to allow old style code to continue to work. For instance, the transition model switch is --old_for_init under Comeau C++. In order for line B to work under Standard C++ , line A needs to be split into two lines:
int numsides;
for (numsides = 0; numsides < maxsides; numsides++) { // revised line A
Here numsides is in scope till the end of its block, having nothing to do with the for as far as the compiler is concerned, which is "as usual". And of course, you can always introduce another brace enclosed block if you want to constrain the scope for some reason:
// ...code previous...
{
int numsides;
for (numsides = 0; numsides < maxsides; numsides++) {
// ...
} // end of for loop
} // scope of numsides ends here
// .. rest of code...
Finally, consider this code:
#include <iostream>
int numsides = 999; // C
int main()
{
const int maxsides = 99;
for (int numsides = 0; numsides < maxsides; numsides++) {
if (...SomeCondition...)
break;
// blah blah
}
if (numsides == maxsides) // B: refers to ::numsides
std::cout << "Broke out of loop\n";
return 0;
}
Here, line B is ok because although the local numsides from the for loop is out of scope, it turns out that the global numsides from line C is still available for use. As a QoI (Quality of Implementation) issue, a compiler might emit a warning in such a case. For instance, Comeau C++ generates the following diagnostic when --for_init_diff_warning is set:
warning: under the old for-init scoping rules variable "numsides" (the one declared at line 2) -- would have been variable "numsides" (declared at line 6)
Too, try to name your identifiers better, and perhaps they will have less chance of clashing like this anyway. Also, in some cases, you may have code something like this:
for (int numsides = 0; numsides < maxsides; numsides++) // D
;
// numsides should be out of scope here
// create another, different, numsides
for (int numsides = 0; numsides < maxsides; numsides++) // E
;
The numsides from line D can not be in scope in order for line E to work. Which is fine with Standard C++, as it won't be. However, you may be using it with a compiler that doesn't support the proper scoping rules, or for some reason, has disabled them. As a short term solution, you may want to consider the following hackery (think about how it'll limit the scope):
#define for if(0);else forDepending upon your compiler, these forms may or may not be better:
#define for if(0) { } else for
#define for if(false) { } else for
Of course, technically, as per Standard C++, redefining a keyword is illegal, but few compilers (if any) will actually diagnose this as a problem.
As well, for a totally different take on the subject, as shown slightly different above, you can add an extra set of braces around the code
{ for (int numsides = 0; numsides < maxsides; numsides++) // F
;
}
{ for (int numsides = 0; numsides < maxsides; numsides++) // G
;
}
You can also of course add additional identifiers:
for (int numsides = 0; numsides < maxsides; numsides++) // H
;
for (int numsides2 = 0; numsides2 < maxsides; numsides2++) // I
;
None of these are optimal for normal code and perhaps not even good short term solutions, but if you have an old compiler, you may have no choice.
// ...
int counter;
// ...
// counter set by some other code somehow
// ...
for (/* nothing here */; counter < SomeMaxValue; counter++) {
// ...
}
Also, perhaps the last expression is complex, and therefore you may want to put its computation into the body of the loop, etc. So, perhaps you might have:
// ...
int counter;
// ...
for (counter = 0; counter < SomeMaxValue; /* nothing here */) {
// ...
// elaborate formula to calculate the next value of counter
// ...
}
This middle condition can also be missing, upon which it is taken as true.
So unless specified elsewhere, this loops forever:
for (int counter; /* nothing here */; counter++) {
// ...
// Perhaps this loop has a test here, and executes a "break;"
// ...
}
In the case where all are missing, as in for(;;), it also establishes an infinite loop. Here too, some other code would be necessary to break out of it, unless of course the app is intended to run continuously:
int main()
{
for ( ; ; ) {
// ...
// Perhaps this loop has a test, and eventually hits a "break;"
// or Perhaps it really does loop forever
// ...
}
}
As an aside, it's worth considering that this:
for (xxx; yyy; zzz) {
aaa;
}
can be trivially thought of as this:
{
xxx;
while(yyy) {
aaa;
zzz;
}
}
We say trivially because we are not trying to say every for loop can be so easily transformed into such a while loop so directly. For instance, if the code in aaa contains a continue statement, then zzz will be executed in the for loop but not in the while loop.
Also, too, it's worth pointing out that the while loop was placed into a block. See #forinit in this FAQ for why.
Note the following infinite loops:
for(;;)..... for(; 1; ).... for(; 1 == 1; ).... for(; true; )... // in C++ while(1).... while(1 == 1).... while(true)... // in C++ do .... while(1); do .... while(true); // in C++ label: ... goto label;The first is probably the most common form seen, though now that C++ and C99 have booleans, that may be changing.
That being so, many compilers, or operating systems, do support an extension in order to be able to clear the screen, or erase the current line. For instance, with Borland, you might do this:
#include <conio.h> // ... clrscr();
For some versions of Microsoft, you might do this:
#include <conio.h> // ... _clearscreen(_GCLEARSCREEN);
One of these will also often work on some systems:
#include <stdlib.h>
// ...
system("cls"); // For Dos/Windows et al "console app"
system("clear"); // For some UNIX's, LINUX, etc.
system("tput clear"); // For some UNIX's
// ...
effectively running the command in quotes as a process. Tied in with this, you could also create a shell script named cls which internally will do the clear or tput clear, and so on. Note that <stdlib.h> might be <cstdlib> in the case of C++, and so therefore references to system() above would be std::system().
You can also just output a known number of newlines in some cases, if you know the number:
// ...
const int SCREENSIZE = ???;
for (int i = 0; i < SCREENSIZE; i++)
// output a newline, perhaps with std::cout << '\n';
// flush the output, perhaps with std::cout.flush();
You can also just hard-code an escape sequence, though this too may not be portable:
std::cout << "\033[2J" << std::flush;This of course means too that something like this could be done in VC++ as well:
#include <conio.h>
// ...
_cputs("\033[2J");
There is also the "curses" (or ncurses in some cases) system, which, although not Standard, was popular with UNIX before X-windows became popular:
#include <curses.h>
int main()
{
initscr(); // set up a curses window
clear(); // clear the curses window
refresh(); // commit the physical window to be the logical one
// stuff using the window
endwin();
return 0;
}
and/or various different graphics and windowing systems, etc.
If it's not obvious to you yet, whether you're clearing the screen, erasing the current line, changing the color of something, or whatever, you'll need to concede that no one way is portable, and that to accomplish what you need you'll have to find a platform specific solution in most cases. Check with your compiler or operating system vendor's documentation for more details. "Google" for it (websites and newsgroups) to see how somebody else may have solved this platform specific problem.
And never discard ingenuity when it's necessary. For instance, on some displays you may get away with emitting a form feed, or a special escape sequence. Or, if you have a primitive device you may have to do something like figure out how many newlines/spaces/tabs/whatever, to emit in order to scroll the screen away, though the cursor may not be where you prefer it. And so on.
Always construct your code sensibly. If you really do need say the ability to clear the screen, don't lace system dependent calls through your code, or even #ifdefs. Instead, write a function to encapsulate the call, so that if you do need to localize it in some way, at least it's just in one place.
There are many different phrases which use involve the use of "null" in Standard C and Standard C++, and they are often misunderstood, confused, and misspoken. Here I will cover some of them:
if (blah) ; /* null statement "just before" the ; */ else a = b;That's more as a convenience, and perhaps not the best strategy. However, there are some other cases where it ends up being a requirement. For instance, a common idiom might be in a loop:
while (getAnotherLine()) /* eat up current input */ ;But there is a more serious problem:
void foo()
{
// ...
goto leave_foo;
// ...
leave_foo:
}
The problem here is that you can't label a closing brace because a label must apply to a statement, so the null statement comes in handy here if there turns out to be nothing else to do:
leave_foo: ;Of course, you could use a return;. Of course, the conditions should be reviewed to see if a goto is warranted. You may want to use null statements in the cases of some switch statements too.
I suspect this rule originally came about because of accidental typings of a colon instead of a semicolon, and so the requirement allows some of those typos to get detected.
#That is, the only thing on the line is the hash mark. At least once in your programming career you might hear the # called an octothorp. If not, this is the one time. :) Anyway, once upon a time, in a land far, far away, this was used for spacing the input file, and so continues to be supported by Standard C (and Standard C++). Actually, I think also that the original C compiler used it as a clue that preprocessing directives were going to be used in the file (somebody email me whether I'm nuts or not). These days it seems a useless directive, unless you are trying to guess the final answer in a game of "C++ Jeopardy".
char array[] = { 'C', 'o', 'm', 'e', 'a', 'u', ' ', 'C', '+', '+', '\0' };
char c;
// ...
c = 'x'; /* bits representing x */
c = '\0'; /* all bits cleared */
As it has the value zero, often plain zero is used instead:
signed char c1 = 0; unsigned char c2 = 0;as there is an implicit conversion from int to char.
But remember that in C, a single character in a character constant is int, but it is a char in C++.
G:\tmp>type socc.c
#include <stdio.h>
int main()
{
printf("%d\n", sizeof(char));
printf("%d\n", sizeof(int));
printf("%d\n", sizeof('x'));
printf("%d\n", sizeof('xx'));
return 0;
}
G:\tmp>como --c99 socc.c
Comeau C/C++ 4.3.4.1 (Mar 30 2005 22:54:12) for MS_WINDOWS_x86
Copyright 1988-2005 Comeau Computing. All rights reserved.
MODE:strict errors C99
G:\tmp>aout
1
4
4
4
G:\tmp>como --c++ socc.c
Comeau C/C++ 4.3.4.1 (Mar 30 2005 22:54:12) for MS_WINDOWS_x86
Copyright 1988-2005 Comeau Computing. All rights reserved.
MODE:strict errors C++
G:\tmp>aout
1
4
1
4
This is so even when the character is spelled in hex, which is what '\0' is attempting to do.
So be careful when using 0 as a char (instead of an int) in C++; if you pass it to a function, you may end up picking the wrong overload:
void foo(int);
void foo(char);
// ...
foo(0); // calls foo(int)
foo('\0'); // calls foo(char)
A null character can come in handy when defining a "C string": characters terminated by a null character. If doing so, this can allow us to get the sizeof a string literal (which may contain embedded null characters), or the strlen() of char arrays (which counts until the first null character, assuming valid input).
Remember that not all character arrays are C strings:
char a[2] = { 'a', 'b' }; /* No null byte */
You would not pass this to a function such as strlen(), and that limitation might be fine assuming that is the intent of this kind of array.
Remember that unspecified elements are 0'd, and hence become null characters too in a context such as:
char x[99] = { '\0' };
Here, x[0] is explicitly initialized to the null character, and the other 98 elements are implicitly initialized to the null character.
A null pointer constant is often used in initializers, assignments, and comparisons.
Note that a null pointer constant is "syntactic sugar". It is only a way of representing a concept in code. Repeat after me 1024 times, or at least write a for loop to generate such output: It does not mean that a subsequent null pointer has all zero bits -- so using a union pun, or calloc(), or memset(), etc is not a portable strategy for obtaining null pointers (or for setting floating points to zero either while it's being mentioned). However, once assigned, a null pointer can be compared to a null pointer constant with no problem, because again, that happens at the syntax level in your code, and the code generated "does the right thing".
Note that although a null pointer is a valid pointer, it is not valid to dereference one:
int *p = 0; *p = 99; // KablooieNote that there is no requirement for programmers to make null pointers out of all pointers that do not point to valid memory. Often this make sense, but often too, the flow and structure of the program can help determine if doing so makes sense or not. For instance, it may be that the pointer goes out of scope, and so = 0'ing it may not always make sense, nor the minor overhead of doing so, nor the possible false sense of security for thinking you've done so. My saying this does not endorse leaving invalid pointers that are going to be used invalid. Doing that is a huge source of bugs.
Note that testing an invalid pointer to see if it is a null pointer against a known null pointer or a null pointer constant is undefined behavior, so it is not usually considered wise to try it.
NULL is an implementation-defined null pointer constant. In C it is often:
#define NULL ((void*)0)However, because of overloading in C++, it is often defined in C++ as:
#define NULL 0 // or 0LNote I say often, because the C or C++ implementation is allowed to choose which way to define it, though it is not (void*)0 in C++
In either language, be careful passing it to a variadic function (one taking a dot-dot-dot argument, aka an ellipsis), since if your code expects one type and it gets passed as another, perhaps with a different size, alignment and representation, then, as usual, all hell breaks loose, and speaking of nulls, you'll get a big fat null check from your boss. Oh, BTW, avoid variadic functions when possible.
Oh, oh, and in C++, a different overload can end up getting chosen depending upon which definition of NULL is used.
Oh, oh, and in C, don't forget to use function prototypes lest you run into similar problems when passing NULL or when returning NULL even when not a variadic function.
Also, newbies seem to like to do this:
char c // ... c = NULL; // Don't do thisIt may compile, or it may not. Either way, it's a misuse of NULL since it should be involved with pointers. It should follow not to use it to do math either.
#include <stdio.h> // ... getchar(); // Wait for any character to be hit <