PDA

View Full Version : Semi advanced C question - new types (proper new types, not just typedefs)




Cromulent
Mar 21, 2009, 10:02 AM
Well I was thinking about this and was wondering what the standard means of creating a new base type in C was?

For instance if I wanted to implement a string type in much the same way as a C++ string type I could type def string to a char*, but obviously I would want to automatically allocate memory to the string passed to the string variable if it was set or do so at a later time so that either of the following bits of code would work:

string myString = "Test";
string myStringEmpty;

Obviously string would need to point to a function that handles all the memory allocation, resizing and destroying if the variable drops out of scope.

Does anyone have any experience with this?



Catfish_Man
Mar 21, 2009, 10:28 AM
Basically, you can't do that. C has no provisions for operator overloading, so you can't do things like custom 'new' or custom assignment. You'll probably need to make a typedef'd struct and a set of functions for manipulating it.

Cromulent
Mar 21, 2009, 10:31 AM
Basically, you can't do that. C has no provisions for operator overloading, so you can't do things like custom 'new' or custom assignment.

While you are of course correct, I don't see why some provision for operator overloading couldn't be hacked in. It would be a bit messy, but you can certainly turn C into an object orientated language (for instance) if you spend enough time writing a library for it.

kainjow
Mar 21, 2009, 10:36 AM
I don't think it's possible with C libraries. You'd have to write some type of preprocessor, which iirc was what Objective-C was early on.

lazydog
Mar 21, 2009, 11:33 AM
Simple, just rename your files .cpp and #include <string> :D

The way I see it, C++ is really 3 languages - plain C, C with classes and C with classes and templates. Now a long time ago I messed around with oo in plain C and it was a pain. You end up having to write functions with a 'this' parameter, explicitly calling destructors and assignment functions etc etc. There's not much point to it as 'C with classes' does it all for you in an elegant way. Unless of course you have a different oo programming paradigm in mind in which case you're better off writing a preprocessor that handles your syntax and dumps out plain C... aha .. sounds like Objective-C... actually I think the early C++ compilers were implemented as preprocessors too.

b e n

Cromulent
Mar 21, 2009, 11:39 AM
Simple, just rename your files .cpp and #include <string> :D

But that takes away the fun of learning how to do it myself :).

ChrisA
Mar 21, 2009, 11:40 AM
Well I was thinking about this and was wondering what the standard means of creating a new base type in C was?

You can't do that in C.

gnasher729
Mar 21, 2009, 12:10 PM
But that takes away the fun of learning how to do it myself :).

Lookup the source code for the gcc std::string implementation. Learning how they did this _is_ fun (well, for some weird kind of people :D ). Wait until you run into the "zero-size base class" problem.

Cromulent
Mar 21, 2009, 12:34 PM
You can't do that in C.

Fairly sure you can, otherwise other languages that were originally just built on top of C would be almost impossible to implement. Unless I have a fundamental misunderstanding of how they achieved it.

Lookup the source code for the gcc std::string implementation. Learning how they did this _is_ fun (well, for some weird kind of people :D ). Wait until you run into the "zero-size base class" problem.

Just getting the SVN checkout of GCC now. I forgot how big it was :).

eddietr
Mar 21, 2009, 12:40 PM
Fairly sure you can, otherwise other languages that were originally just built on top of C would be almost impossible to implement. Unless I have a fundamental misunderstanding of how they achieved it.

Well, those languages often started as custom preprocessors coupled with support libraries which are passed to a c compiler.

Cromulent
Mar 21, 2009, 12:59 PM
Well, those languages often started as custom preprocessors coupled with support libraries which are passed to a c compiler.

Okay let me see if I have got this straight (bear with me as I am likely to have misunderstood).

You start off by defining a syntax for your new language enhancement and then use a tool such as GNU Flex to preprocess it. This then outputs a format which can then be combined with the original C code and compiled in the normal fashion. Is that the gist of it?

Or is a tool more along the lines of GNU Bison more suitable?

eddietr
Mar 21, 2009, 01:20 PM
Okay let me see if I have got this straight (bear with me as I am likely to have misunderstood).

You start off by defining a syntax for your new language enhancement and then use a tool such as GNU Flex to preprocess it. This then outputs a format which can then be combined with the original C code and compiled in the normal fashion. Is that the gist of it?

Or is a tool more along the lines of GNU Bison more suitable?

Most likely you'll need both to do what you want. But to be honest, I haven't worked on a compiler in about 20 years (using lex and yacc which are similar to flex and bison, respectively).

Cromulent
Mar 21, 2009, 01:45 PM
Most likely you'll need both to do what you want. But to be honest, I haven't worked on a compiler in about 20 years (using lex and yacc which are similar to flex and bison, respectively).

Okay, thanks for putting me on the right track.

Catfish_Man
Mar 21, 2009, 03:52 PM
A better starting point for modifying C would probably be clang. Much more modular and easy to understand than GCC, and (for C code) almost as complete.

lee1210
Mar 21, 2009, 04:07 PM
I don't dispute the guidance given relating to technically implementing such a thing, but I think there was a step (or two) missed. A lexical analyzer and parser are really of little use if you don't have a grammar to analyze and parse. You can probably find a transliteration of the K&R grammar to BNF or some other "more usable" form around the internet as a starting point. You may also wish to look at some publicly available grammars such as Python's to get a feel for things:
http://docs.python.org/3.0/reference/grammar.html

There is a possible way around changing/extending a compiler, which some people have mentioned, which is to make a preprocessor that generates C (or C++ or Objective-C) that can be passed to an existing compiler. A lot of languages that eventually mature to specific compilers start life as preprocessors that generate a language that already has a compiler. Technically, such a preprocessor is a compiler, compiling from your new Extended-C to C.

Back to my original point..o even though you don't need a formalized grammar to feed to yacc, etc. you would be well served to have one anyway. Without it, you're just guessing at your syntax and structure rules, and you might do things slightly differently in different places, leading to disaster. If you have something formalized to refer to, you don't have to ask if widget x matches the way widget y does something, you have to make sure x and y match the grammar specification.

This is a big project. Have realistic goals, and have some side projects you can work on when you hit a stopping point. Rarely is banging one's head against a tough problem the way through it, so you'll need distractions.

-Lee

jw2002
Mar 22, 2009, 01:00 AM
But that takes away the fun of learning how to do it myself :).

I knew someone who did OO programming in C. It was a pain because you have to do a lot of stuff that the compiler takes care of behind the scenes. The object model for a C++ class in C is a struct with regular variables for the data members and function pointers for the methods. As a result, you end up having to make a lot of explicit calls to make sure things like methods are set up correctly:

typedef struct string {
int _len;
char *_string;
void (*Assign)(char *);
};

string x;
char *y = "blort";
Construct(&x); // Initialize values AND function pointers
x.Assign(y);

Cromulent
Mar 22, 2009, 01:47 AM
I knew someone who did OO programming in C. It was a pain because you have to do a lot of stuff that the compiler takes care of behind the scenes. The object model for a C++ class in C is a struct with regular variables for the data members and function pointers for the methods. As a result, you end up having to make a lot of explicit calls to make sure things like methods are set up correctly:

Certainly not the cleanest way of doing things, but as a learning experience I think it offers more than using an already existing OO language.

I don't dispute the guidance given relating to technically implementing such a thing, but I think there was a step (or two) missed. A lexical analyzer and parser are really of little use if you don't have a grammar to analyze and parse. You can probably find a transliteration of the K&R grammar to BNF or some other "more usable" form around the internet as a starting point. You may also wish to look at some publicly available grammars such as Python's to get a feel for things:
http://docs.python.org/3.0/reference/grammar.html

There is a possible way around changing/extending a compiler, which some people have mentioned, which is to make a preprocessor that generates C (or C++ or Objective-C) that can be passed to an existing compiler. A lot of languages that eventually mature to specific compilers start life as preprocessors that generate a language that already has a compiler. Technically, such a preprocessor is a compiler, compiling from your new Extended-C to C.

Back to my original point..o even though you don't need a formalized grammar to feed to yacc, etc. you would be well served to have one anyway. Without it, you're just guessing at your syntax and structure rules, and you might do things slightly differently in different places, leading to disaster. If you have something formalized to refer to, you don't have to ask if widget x matches the way widget y does something, you have to make sure x and y match the grammar specification.

This is a big project. Have realistic goals, and have some side projects you can work on when you hit a stopping point. Rarely is banging one's head against a tough problem the way through it, so you'll need distractions.

-Lee

Thank you Lee. That was a very useful post as always.

A better starting point for modifying C would probably be clang. Much more modular and easy to understand than GCC, and (for C code) almost as complete.

Thanks, I got the latest SVN of both clang and LLVM and am looking through them. You're certainly right in that it is simpler than GCC, it is just a same that it is in C++ as I am less comfortable with that. I guess it is time to get out my copy of the "The C++ Programming Language".

Sander
Mar 22, 2009, 04:07 AM
But that takes away the fun of learning how to do it myself :).

You may actually be onto something very profound here. In my experience, running into a problem "in real life" and first trying to figure out a solution yourself leads to much deeper understanding.

Personally, I always find it strange that a first introduction to computer programming would pick OO as a paradigm. In my opinion, OO only adds value once your projects get relatively large and/or complex. For example, when explaining what a program is, it's quite common to bring out the bearded "recipe" analogy. But making a cake is quite procedural. You don't line up all the ingredients on the kitchen table and yell "Construct yourself!" to them.

Since you've decided you'd like to add new "first class" types to C, you'll find out a way to group data and its related functionality together in structs, and you'll probably start wondering whether there isn't a better way to handle object construction and destruction, without having to remember calling "Construct" and "Destruct" yourself. You're even prepared to add some kind of preprocessor magic and/or extend C for these goals.

After a while of fiddling with this stuff, go pick up "The Design and Evolution of C++" (http://www.research.att.com/~bs/dne.html) and you'll have plenty of "Yeah! That was my problem too!" moments. For all the critique C++ gets, you'll see why things are the way they are.

Have fun on this interesting journey!

Cromulent
Mar 23, 2009, 01:17 PM
Well I've managed to work out Bison enough to create a very, very simple parser but now I am wondering what the best method to go about this is. Should I just create a whole C preprocessor based on the language grammar specified in Appendix A of the C99 standard document or should I just create a simple parser for my enhancements that is then designed to feed into GCC or something?

Sander
Mar 23, 2009, 02:21 PM
I think you may have to implement the whole C99 parsing, too. It could output the "plain" C99 stuff unchanged, and insert your own (C) code when it encounters your enhancements. You _could_ take a shortcut and look for your enhancements speficially, if you're not afraid to break "existing code".

Perhaps you can find ready-made C parsers on the net somewhere..?

lazydog
Mar 23, 2009, 03:10 PM
Well I've managed to work out Bison enough to create a very, very simple parser but now I am wondering what the best method to go about this is. Should I just create a whole C preprocessor based on the language grammar specified in Appendix A of the C99 standard document or should I just create a simple parser for my enhancements that is then designed to feed into GCC or something?

Have you decided what oo 'features' you're going to have and what the syntax us? Depending on what you want to do you might be able to get away with using C preprocessor macros or M4, and some run time libraries. The end result will be ugly but it would get you to the implementation and testing phase quickly. At that point if you thought it was worth it you could then go back and create a cleaner and more efficient syntax implemented as a proper preprocessor or whatever. The important thing is to get your ideas down and experimented with!

b e n

Cromulent
Mar 24, 2009, 04:44 AM
Hmm M4 looks interesting.

Yes as this was just something that I wanted to do for a bit of fun I'm not too bothered if it ends up being a bit ugly. Getting things working is a pretty big step, including writing what is in essence a new standard library for the new base types.

One thing I was wondering is that I decided that you could concatenate two strings by using the addition sign, so for instance:

string stringOne = "Hello ";
string stringTwo = "World!"

stringOne = stringOne + stringTwo;

printf("%es", stringOne); // %es would be the new printf instruction for the string type

// would print "Hello World!"

the divide sign would tell you how many instances of a string appeared in another string but using the multiplication sign to do the obvious thing seems rather redundant, can anyone think of a better use of it? Also what about the modulo sign?

lazydog
Mar 24, 2009, 05:38 AM
I think + for strings would be great but I'm not sure overloading more operators would be as useful.

b e n

lee1210
Mar 24, 2009, 07:20 AM
As long as string is a different sort of thing, this is fine. If it was just a typedef'd char *... there's already pointer addition, and unary * on a char * is to dereference the pointer, etc.

I guess i would say, at this point... think bigger. You can certainly do these things, but C++ already does them. At a certain point, if string is an Object, you should generalize to allowing user-defined (overloaded) operations with the operators (+,-,*,/,[],%,etc.), not just a fixed list. If string is just a built-in type, and not a "regular" object, that's a bit different, but might be limiting.

-Lee

Cromulent
Mar 26, 2009, 10:02 AM
I guess i would say, at this point... think bigger.

I took your advice and did some research and found some interesting articles on implementing generic types in C using void pointers. Maybe that would be a better solution rather than trying to implement a new base type like string?