PDA

View Full Version : Dynamic string handling (C)




Cromulent
Feb 10, 2008, 03:45 PM
Okay I've been trying to figure this out but without much luck. I'm trying to write a simple program that takes input and manipulates it. Easy and that is already done (I'm using C).

The problem is I don't want any hard coded limits on the length of the string in the program and all of the string functions seem to require an array of chars which is also fine. But I can't find a function which will just count the number of characters in a string and put the result into an int which I can then use to specify the size of the array of chars.

The only other way I can think of doing it would be to use getch() to get the characters one by one and then just have a simple counter while I am running a while loop or something. The problem is I still need to declare an array for getch to work which then adds a hardcoded limit to the length of the string my program will handle. Likewise strnlen() counts the length of a given string but it will only count it from an already existing buffer.

Should I be looking at malloc style functions for this?



Eraserhead
Feb 10, 2008, 03:48 PM
look at strlen to find the length of the string, see man strlen for more information.

If you need variable length strengths you'll need malloc. Malloc n+1*sizeof(char) characters. As you need an extra one for the NULL terminator.

Cromulent
Feb 10, 2008, 04:43 PM
look at strlen to find the length of the string, see man strlen for more information.

Yep, I already mentioned strlen in my oringinal post, but it is not dynamic.


int x = 0;
char buffer[x];

x = strnlen(buffer);


You can see the problem. I can't set the value of x without having a character buffer. If I declare a character buffer to then set the size of another character buffer, the program is limited to the size of the original character buffer.

If you need variable length strengths you'll need malloc. Malloc n+1*sizeof(char) characters. As you need an extra one for the NULL terminator.

Not sure I follow you here. Would I use something like that instead of an array of chars? Or would I use malloc to set the size of the array. I have not looked into malloc and memory management in general much.

Thanks for the help.

aross99
Feb 10, 2008, 05:14 PM
You use "malloc()" to allocate memory dynamically (ie at run time), and not at compile time.

Instead of doing something like this:

char buffer[100];

do this:

char *buffer;

buffer=malloc(100);

This makes buffer point to a 100 byte memory segment - the end result is the same, but it was assigned dynamically, and not at compile time.

Malloc is usually used with free(), which returns the memory used by malloc when you are done with it. Just make sure you are really done with it before you call it like htis:

free(buffer);

Once you do this, you can't use the memory referenced by buffer until you do another malloc.

As you can imaging, managing malloc/free is a bit tricky and a common source of errors. If you have ever heard of a "memory leak", that comes from doing malloc() without free() to give the memory back.

I hope this helps...

Eraserhead
Feb 10, 2008, 05:15 PM
Set the original buffer size to LINE_MAX in limits.h, it may be better to malloc the array of course, but I seem to have got away with []'ing it.

Remember if you are using malloc (EDIT: as described by aross99) to check for success when doing it.

toddburch
Feb 10, 2008, 05:17 PM
...But I can't find a function which will just count the number of characters in a string and put the result into an int which I can then use to specify the size of the array of chars.

strlen() does that, as Erasehead points out.

I think there's a terminology thing going on. A "string" in C is defined as a sequence of character that end in a binary zero. If you are indeed accepting a string into your program, you can indeed use strlen().

If, however, you are reading a character at a time from stdin (which your mention of getch() implies), and you don't want to constrain yourself with any limits on length of the string, then that's fine - you can still do that.

You have a few choices, but the first, and probably easiest, is to declare a hardcoded array of some really big and reasonable length - maybe 1,001 characters. Then, start your getch()'ing. When you get to 1000, and you're not done, then do a malloc() for 2001 bytes. Copy your data over into the malloc'ed area, and keep getch()ing. If you fill up, do it again, free()'ing the first malloc()'ed area as well.

(Look up realloc() as well - might save you a few steps if your malloc'ed area becomes too small)

When you get what you determine is the last character, put a binary zero on the end of the array and you've created a "string" for yourself.

Todd

Cromulent
Feb 10, 2008, 05:27 PM
strlen() does that, as Erasehead points out.

I think there's a terminology thing going on. A "string" in C is defined as a sequence of character that end in a binary zero. If you are indeed accepting a string into your program, you can indeed use strlen().

You misunderstand me. I am well aware of strlen() and its uses. I could already implement the following code to deal with this :


int x;
char buffer[100];

gets(buffer);
x = strlen(buffer);


BUT, the problem with that code is that the size of the string is limited to the size of the array buffer. I do not want any hard coded limit on the size of my string.

Therefore strlen() is not an option as it requires a buffer to be declared before you can use it. Therefore you need to set an arbitary size of the buffer in advance which means that you already have a hard coded limit on the size of the string that your program can accept. Setting the size of the buffer to a stupidly large number is not an option.

If, however, you are reading a character at a time from stdin (which your mention of getch() implies), and you don't want to constrain yourself with any limits on length of the string, then that's fine - you can still do that.

You have a few choices, but the first, and probably easiest, is to declare a hardcoded array of some really big and reasonable length - maybe 1,001 characters. Then, start your getch()'ing. When you get to 1000, and you're not done, then do a malloc() for 2001 bytes. Copy your data over into the malloc'ed area, and keep getch()ing. If you fill up, do it again, free()'ing the first malloc()'ed area as well.

(Look up realloc() as well - might save you a few steps if your malloc'ed area becomes too small)

When you get what you determine is the last character, put a binary zero on the end of the array and you've created a "string" for yourself.

Todd

Fantastic, thanks. That is exactly what I was after.

GreatDrok
Feb 10, 2008, 05:36 PM
I usually declare strings as some sensible size using malloc() and then as I close in to that size I realloc() the string to something bigger, say +1000, and then once I have all the data in I realloc() it one more time to set the string to the length I want. Allocating a string to be quite big saves you realloc'ing a lot because each time you do a realloc() you are copying the contents of the original array into a new location so once strings get large this can have a detrimental effect on performance so try and do it as few times as possible within a loop. Also, rather than using strlen() each time to keep an eye on the size, store the length in an int. Just use strlen() for the final realloc() to set the size to the exact length needed. OK, so maybe that is teaching you to suck eggs but it is good practice so worth mentioning.

Oh, and always remember to free() the memory when you don't need it anymore. C doesn't do garbage collection like Java so your memory management has to be meticulous, especially when you are doing dynamic allocation on a large scale. Many of my programs do biological sequence comparison so I allocate and free a lot of string arrays.

Cromulent
Feb 11, 2008, 05:13 AM
Okay I have read through some documentation. Does this code look good?

#include <stdio.h>
#include <stdlib.h>

int main (int argc, const char * argv[])
{
char *ptr;
size_t length;

ptr = (char *) malloc(length +1);

gets(ptr);
printf("%s", ptr);

free(ptr);

return 0;
}

robbieduncan
Feb 11, 2008, 05:33 AM
Okay I have read through some documentation. Does this code look good?

#include <stdio.h>
#include <stdlib.h>

int main (int argc, const char * argv[])
{
char *ptr;
size_t length;

ptr = (char *) malloc(length +1);

gets(ptr);
printf("%s", ptr);

free(ptr);

return 0;
}

You have to set length to anything before using it. It's value will either be 0 or whatever was in it before (I forget whether C guarantees new variables are zeroed). Either way this is not a good idea.

Cromulent
Feb 11, 2008, 06:11 AM
You have to set length to anything before using it. It's value will either be 0 or whatever was in it before (I forget whether C guarantees new variables are zeroed). Either way this is not a good idea.
Ah I see. Thanks for that, I always forget to initialise my variables.

When you say this is not a good idea, do you mean not initialising variables or do you mean the entire method?

robbieduncan
Feb 11, 2008, 06:48 AM
Ah I see. Thanks for that, I always forget to initialise my variables.

When you say this is not a good idea, do you mean not initialising variables or do you mean the entire method?

Not initialising variables. Although the whole method looks like it creates a fixed length buffer and does not dynamically increase this as you wanted.

toddburch
Feb 11, 2008, 06:57 AM
Only vars declared as static will be set to zero upon initial load.

I believe Robbie meant it's not a good idea to not initialize variables.

Another thing that's not a good idea is to use gets(). gets() will get a string of any length. It's the perfect candidate for enabling buffer overruns and such. If you ran your program, and the user copy/pasted in their input, it could easily cause your malloc()ed storage to be overrun, causing a crash or other undesired affect. http://www.cppreference.com/stdio/gets.html

Therefore, it's highly suggested to use fgets() instead of gets(), specifying stdin and the length of characters you allow (which would be the size of your buffer). http://www.cppreference.com/stdio/fgets.html

If fgets() does recognize end-of-string, it will append the newline character to the data followed by the null-term character. Otherwise, it will only append the null-term character, and you can determine that end-of-string has not been reached and do your whole realloc() thing.

Finally, one last comment on your progress so far. It's generally accepted today to concern yourself with unicode. Therefore, your malloc() should take this into account. So, multiply the length you are requested by the size of the data type, like this:

malloc((length +1)*sizeof(char));

In this particular case, sizeof(char) does equal 1.

Todd

Cromulent
Feb 12, 2008, 11:38 AM
First off I'd like to say thank you to everyone who has helped so far I have learnt a lot doing (what I originally thought would be simple) this.

But I'm stuck again. I've tried loads of different methods but I think I'm missing something really simple or I'm trying to do something really stupid. Either one is likely :).

The code below is a work in progress and does not actually work. I'm just looking for some tips where I am going wrong.


#include <stdio.h>
#include <stdlib.h>

int * memResize(int *, size_t);

int main (int argc, const char * argv[])
{
int *buf_one, *buf_two;
size_t memSize = 1; /* Setting to 1 as an arbitary amount - I'm interested in expanding the amount allocated thus 1 is guarenteed to need expanding */

buf_one = (int *) malloc((memSize)*sizeof(char));
if(buf_one == NULL)
{
printf("Memory Error.\n");
}

while((int)buf_one = getchar() != '\n')
{
/* Reallocation code - stuck :( */
}

buf_two = memResize(buf_one, memSize); /* I'm not sure about this function at all */

printf("%s", &buf_one);

free(buf_one);
free(buf_two);

exit(EXIT_SUCCESS);
}

int * memResize(int *buf_one, size_t memFromMain)
{
int *rtnPtr;

rtnPtr = realloc(buf_one, memFromMain);
if(rtnPtr == NULL)
{
printf("Memory reallocation error.\n");
}

return rtnPtr;
}

I'm stuck on the reallocation code and what to check against to see if the buffer is full. Is there a function that will say how full a certain chunk of memory is?

I think I've lost myself. My earlier attempts were better, I think this version is a mess and I'm not sure whats wrong (it does not work anyway).

toddburch
Feb 12, 2008, 12:28 PM
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>

char * memResize(char *, size_t newSize, size_t oldSize);

int main (int argc, const char * argv[])
{
char *buf_one, *current_pointer;
char c ;
size_t memSize = 1; /* Setting to 1 as an arbitary amount - I'm interested in expanding the amount allocated thus 1 is guarenteed to need expanding */
size_t memUsed = 0 ;

buf_one = (char *) malloc((memSize)*sizeof(char));

if(buf_one == NULL)
{
printf("Memory Error.\n");
return -1 ;
}

current_pointer = buf_one ;

while ( (c = getchar()) != '\n')
{
*current_pointer++ = c ;
memUsed++ ;
if (memUsed==memSize) {
memSize *= 2 ; // Double the size
buf_one = memResize(buf_one, memSize, memUsed) ;
current_pointer = buf_one + memUsed ;
if (buf_one == NULL) return -1 ;
}

}

printf("%s\n", buf_one);

free(buf_one);

exit(EXIT_SUCCESS);
}

char * memResize(char * oldPtr, size_t newSize, size_t oldSize)
{
char *newPtr ;
printf("Getting %d more bytes...\n", newSize) ;
newPtr = malloc(newSize);
if (newPtr == NULL)
{
printf("Memory reallocation error.\n");
return NULL ;
}

memcpy(newPtr, oldPtr, oldSize) ; // copy data over

free(oldPtr) ;

return newPtr;
}

toddburch
Feb 12, 2008, 12:42 PM
Here's the code with realloc().

Todd


#include <stdio.h>
#include <stdlib.h>

char * memResize(char *, size_t newSize, size_t oldSize);

int main (int argc, const char * argv[])
{
char *buf_one, *current_pointer;
char c ;
size_t memSize = 1; /* Setting to 1 as an arbitary amount - I'm interested in expanding the amount allocated thus 1 is guarenteed to need expanding */
size_t memUsed = 0 ;

buf_one = (char *) malloc((memSize)*sizeof(char));

if(buf_one == NULL)
{
printf("Memory Error.\n");
return -1 ;
}

current_pointer = buf_one ;

while ( (c = getchar()) != '\n')
{
*current_pointer++ = c ;
memUsed++ ;
if (memUsed==memSize) {
memSize *= 2 ; // Double the size
buf_one = memResize(buf_one, memSize, memUsed) ;
current_pointer = buf_one + memUsed ;
if (buf_one == NULL) return -1 ;
}

}

printf("%s\n", buf_one);

free(buf_one);

exit(EXIT_SUCCESS);
}

char * memResize(char * oldPtr, size_t newSize, size_t oldSize)
{
char *newPtr ;
printf("Getting %d more bytes...\n", newSize) ;
newPtr = realloc(oldPtr, newSize);
if (newPtr == NULL)
{
printf("Memory reallocation error.\n");
return NULL ;
}

return newPtr;
}

fimac
Feb 12, 2008, 12:43 PM
I'm stuck on the reallocation code and what to check against to see if the buffer is full. Is there a function that will say how full a certain chunk of memory is?

Some libraries provide such a function, but more commonly you have to keep track of this yourself.

Hopefully the following example is useful :)


#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define GROWTH_STEP 8 /* really small, used to demonstrate algorithm */

size_t bufsize;
size_t bufused;
char* buf;

static void addch(int c)
{
if (bufused >= bufsize)
{
bufsize += GROWTH_STEP;
buf = realloc(buf, bufsize);
}
buf[bufused++] = c;
}

int main(void)
{
int c;

while ((c = getchar()) != '\n')
addch(c);

addch('\0');
printf("%u/%u=(%s)\n", (unsigned)strlen(buf), (unsigned)bufsize, buf);
free(buf);

return EXIT_SUCCESS;
}

Cromulent
Feb 12, 2008, 01:06 PM
Wow, thanks for the help guys. I'll have to sit down and go through that when I have a little more free time :).

I'm getting there, slowly but surely, hopefully I'll have enough knowledge of the fundamentals of C to do something useful soon.

ChrisA
Feb 12, 2008, 03:37 PM
You misunderstand me. I am well aware of strlen() and its uses. I could already implement the following code to deal with this :


int x;
char buffer[100];

gets(buffer);
x = strlen(buffer);


BUT, the problem with that code is that the size of the string is limited to the size of the array buffer. I do not want any hard coded limit on the size of my string.

Try this


char *buffer;
buffer = malloc(CHUNK);
while(...) {
buffer[i] = getchar();
i++;
if ( i > CHUNK)
buffer = realoc( ...);
}

count = strlen(buffer);


You fill in the details. but dobe sure to check that malloc and realloc return non-null.

yeroen
Feb 12, 2008, 03:53 PM
As a general rule, be very careful with realloc. In particular don't do this:

p = realloc(p,nbytes)

What if realloc returns null? Then you wind up wiping out your original pointer p. Even if it doesn't return null, you also have to be mindful of updating references to the original malloc'd block should realloc move a chunk of that memory to a new location.

fimac
Feb 13, 2008, 03:03 PM
As a general rule, be very careful with realloc.

That's good advice! In any serious project I wrap malloc, realloc etc. in the following style:


void* erealloc(void *aptr, unsigned long nbytes)
{
void *p = (void*)realloc(aptr, nbytes);
if (NULL == p)
{
free(aptr);
abort();
}
return p;
}


Another technique would be to use reallocf.