Dynamic string handling (C)

Discussion in 'Mac Programming' started by Cromulent, Feb 10, 2008.

  1. macrumors 603

    Cromulent

    Joined:
    Oct 2, 2006
    Location:
    The Land of Hope and Glory
    #1
    Okay I've been trying to figure this out but without much luck. I'm trying to write a simple program that takes input and manipulates it. Easy and that is already done (I'm using C).

    The problem is I don't want any hard coded limits on the length of the string in the program and all of the string functions seem to require an array of chars which is also fine. But I can't find a function which will just count the number of characters in a string and put the result into an int which I can then use to specify the size of the array of chars.

    The only other way I can think of doing it would be to use getch() to get the characters one by one and then just have a simple counter while I am running a while loop or something. The problem is I still need to declare an array for getch to work which then adds a hardcoded limit to the length of the string my program will handle. Likewise strnlen() counts the length of a given string but it will only count it from an already existing buffer.

    Should I be looking at malloc style functions for this?
     
  2. macrumors G4

    Eraserhead

    Joined:
    Nov 3, 2005
    Location:
    UK
    #2
    look at strlen to find the length of the string, see man strlen for more information.

    If you need variable length strengths you'll need malloc. Malloc n+1*sizeof(char) characters. As you need an extra one for the NULL terminator.
     
  3. thread starter macrumors 603

    Cromulent

    Joined:
    Oct 2, 2006
    Location:
    The Land of Hope and Glory
    #3
    Yep, I already mentioned strlen in my oringinal post, but it is not dynamic.

    Code:
    int x = 0;
    char buffer[x];
     
    x = strnlen(buffer);
    
    You can see the problem. I can't set the value of x without having a character buffer. If I declare a character buffer to then set the size of another character buffer, the program is limited to the size of the original character buffer.

    Not sure I follow you here. Would I use something like that instead of an array of chars? Or would I use malloc to set the size of the array. I have not looked into malloc and memory management in general much.

    Thanks for the help.
     
  4. macrumors 68000

    aross99

    Joined:
    Dec 17, 2006
    Location:
    East Lansing, MI
    #4
    You use "malloc()" to allocate memory dynamically (ie at run time), and not at compile time.

    Instead of doing something like this:

    char buffer[100];

    do this:

    char *buffer;

    buffer=malloc(100);

    This makes buffer point to a 100 byte memory segment - the end result is the same, but it was assigned dynamically, and not at compile time.

    Malloc is usually used with free(), which returns the memory used by malloc when you are done with it. Just make sure you are really done with it before you call it like htis:

    free(buffer);

    Once you do this, you can't use the memory referenced by buffer until you do another malloc.

    As you can imaging, managing malloc/free is a bit tricky and a common source of errors. If you have ever heard of a "memory leak", that comes from doing malloc() without free() to give the memory back.

    I hope this helps...
     
  5. macrumors G4

    Eraserhead

    Joined:
    Nov 3, 2005
    Location:
    UK
    #5
    Set the original buffer size to LINE_MAX in limits.h, it may be better to malloc the array of course, but I seem to have got away with []'ing it.

    Remember if you are using malloc (EDIT: as described by aross99) to check for success when doing it.
     
  6. macrumors 6502a

    Joined:
    Dec 4, 2006
    Location:
    Katy, Texas
    #6
    strlen() does that, as Erasehead points out.

    I think there's a terminology thing going on. A "string" in C is defined as a sequence of character that end in a binary zero. If you are indeed accepting a string into your program, you can indeed use strlen().

    If, however, you are reading a character at a time from stdin (which your mention of getch() implies), and you don't want to constrain yourself with any limits on length of the string, then that's fine - you can still do that.

    You have a few choices, but the first, and probably easiest, is to declare a hardcoded array of some really big and reasonable length - maybe 1,001 characters. Then, start your getch()'ing. When you get to 1000, and you're not done, then do a malloc() for 2001 bytes. Copy your data over into the malloc'ed area, and keep getch()ing. If you fill up, do it again, free()'ing the first malloc()'ed area as well.

    (Look up realloc() as well - might save you a few steps if your malloc'ed area becomes too small)

    When you get what you determine is the last character, put a binary zero on the end of the array and you've created a "string" for yourself.

    Todd
     
  7. thread starter macrumors 603

    Cromulent

    Joined:
    Oct 2, 2006
    Location:
    The Land of Hope and Glory
    #7
    You misunderstand me. I am well aware of strlen() and its uses. I could already implement the following code to deal with this :

    Code:
     
    int x;
    char buffer[100];
     
    gets(buffer);
    x = strlen(buffer);
    
    BUT, the problem with that code is that the size of the string is limited to the size of the array buffer. I do not want any hard coded limit on the size of my string.

    Therefore strlen() is not an option as it requires a buffer to be declared before you can use it. Therefore you need to set an arbitary size of the buffer in advance which means that you already have a hard coded limit on the size of the string that your program can accept. Setting the size of the buffer to a stupidly large number is not an option.

    Fantastic, thanks. That is exactly what I was after.
     
  8. macrumors 6502a

    GreatDrok

    Joined:
    May 1, 2006
    Location:
    New Zealand
    #8
    I usually declare strings as some sensible size using malloc() and then as I close in to that size I realloc() the string to something bigger, say +1000, and then once I have all the data in I realloc() it one more time to set the string to the length I want. Allocating a string to be quite big saves you realloc'ing a lot because each time you do a realloc() you are copying the contents of the original array into a new location so once strings get large this can have a detrimental effect on performance so try and do it as few times as possible within a loop. Also, rather than using strlen() each time to keep an eye on the size, store the length in an int. Just use strlen() for the final realloc() to set the size to the exact length needed. OK, so maybe that is teaching you to suck eggs but it is good practice so worth mentioning.

    Oh, and always remember to free() the memory when you don't need it anymore. C doesn't do garbage collection like Java so your memory management has to be meticulous, especially when you are doing dynamic allocation on a large scale. Many of my programs do biological sequence comparison so I allocate and free a lot of string arrays.
     
  9. thread starter macrumors 603

    Cromulent

    Joined:
    Oct 2, 2006
    Location:
    The Land of Hope and Glory
    #9
    Okay I have read through some documentation. Does this code look good?

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    int main (int argc, const char * argv[])
    {
        char *ptr;
        size_t length;
        
        ptr = (char *) malloc(length +1);
        
        gets(ptr);
        printf("%s", ptr);
        
        free(ptr);
        
        return 0;
    }
     
  10. Moderator emeritus

    robbieduncan

    Joined:
    Jul 24, 2002
    Location:
    London
    #10
    You have to set length to anything before using it. It's value will either be 0 or whatever was in it before (I forget whether C guarantees new variables are zeroed). Either way this is not a good idea.
     
  11. thread starter macrumors 603

    Cromulent

    Joined:
    Oct 2, 2006
    Location:
    The Land of Hope and Glory
    #11
    Ah I see. Thanks for that, I always forget to initialise my variables.

    When you say this is not a good idea, do you mean not initialising variables or do you mean the entire method?
     
  12. Moderator emeritus

    robbieduncan

    Joined:
    Jul 24, 2002
    Location:
    London
    #12
    Not initialising variables. Although the whole method looks like it creates a fixed length buffer and does not dynamically increase this as you wanted.
     
  13. macrumors 6502a

    Joined:
    Dec 4, 2006
    Location:
    Katy, Texas
    #13
    Only vars declared as static will be set to zero upon initial load.

    I believe Robbie meant it's not a good idea to not initialize variables.

    Another thing that's not a good idea is to use gets(). gets() will get a string of any length. It's the perfect candidate for enabling buffer overruns and such. If you ran your program, and the user copy/pasted in their input, it could easily cause your malloc()ed storage to be overrun, causing a crash or other undesired affect. http://www.cppreference.com/stdio/gets.html

    Therefore, it's highly suggested to use fgets() instead of gets(), specifying stdin and the length of characters you allow (which would be the size of your buffer). http://www.cppreference.com/stdio/fgets.html

    If fgets() does recognize end-of-string, it will append the newline character to the data followed by the null-term character. Otherwise, it will only append the null-term character, and you can determine that end-of-string has not been reached and do your whole realloc() thing.

    Finally, one last comment on your progress so far. It's generally accepted today to concern yourself with unicode. Therefore, your malloc() should take this into account. So, multiply the length you are requested by the size of the data type, like this:
    Code:
    malloc((length +1)*sizeof(char));
    
    In this particular case, sizeof(char) does equal 1.

    Todd
     
  14. thread starter macrumors 603

    Cromulent

    Joined:
    Oct 2, 2006
    Location:
    The Land of Hope and Glory
    #14
    First off I'd like to say thank you to everyone who has helped so far I have learnt a lot doing (what I originally thought would be simple) this.

    But I'm stuck again. I've tried loads of different methods but I think I'm missing something really simple or I'm trying to do something really stupid. Either one is likely :).

    The code below is a work in progress and does not actually work. I'm just looking for some tips where I am going wrong.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    int * memResize(int *, size_t);
    
    int main (int argc, const char * argv[])
    {
        int *buf_one, *buf_two;
        size_t memSize = 1; /* Setting to 1 as an arbitary amount - I'm interested in expanding the amount allocated thus 1 is guarenteed to need expanding */
        
        buf_one = (int *) malloc((memSize)*sizeof(char));
            if(buf_one == NULL)
            {
                printf("Memory Error.\n");
            }
        
            while((int)buf_one = getchar() != '\n')
            {
                /* Reallocation code - stuck :( */
            }
            
        buf_two = memResize(buf_one, memSize); /* I'm not sure about this function at all */ 
        
        printf("%s", &buf_one);
        
        free(buf_one);
        free(buf_two);
        
        exit(EXIT_SUCCESS);
    }
    
    int * memResize(int *buf_one, size_t memFromMain)
    {
        int *rtnPtr;
    
        rtnPtr = realloc(buf_one, memFromMain);
            if(rtnPtr == NULL)
            {
                printf("Memory reallocation error.\n");
            }
            
        return rtnPtr;
    }
    I'm stuck on the reallocation code and what to check against to see if the buffer is full. Is there a function that will say how full a certain chunk of memory is?

    I think I've lost myself. My earlier attempts were better, I think this version is a mess and I'm not sure whats wrong (it does not work anyway).
     
  15. macrumors 6502a

    Joined:
    Dec 4, 2006
    Location:
    Katy, Texas
    #15
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <memory.h> 
    
    char * memResize(char *, size_t newSize, size_t oldSize);
    
    int main (int argc, const char * argv[])
    {
        char *buf_one, *current_pointer;
        char c ; 
        size_t memSize = 1; /* Setting to 1 as an arbitary amount - I'm interested in expanding the amount allocated thus 1 is guarenteed to need expanding */
        size_t memUsed = 0 ; 
        
        buf_one = (char *) malloc((memSize)*sizeof(char));
    	
    	if(buf_one == NULL)
    	{	
    		printf("Memory Error.\n");
    		return -1 ; 
    	}
    	
    	current_pointer = buf_one ; 
    	
    	while ( (c = getchar()) != '\n')
    	{
    		*current_pointer++ = c ; 
    		memUsed++ ;  
    		if (memUsed==memSize) { 
    			memSize *= 2 ;         // Double the size 
    			buf_one = memResize(buf_one, memSize, memUsed) ;
    			current_pointer = buf_one + memUsed ; 
    			if (buf_one == NULL) return -1 ; 
    		} 
    		
    	}
        
        printf("%s\n", buf_one);
        
        free(buf_one);
        
        exit(EXIT_SUCCESS);
    }
    
    char * memResize(char * oldPtr, size_t newSize, size_t oldSize)
    {
        char *newPtr ; 
        printf("Getting %d more bytes...\n", newSize) ; 
        newPtr = malloc(newSize);
    	if (newPtr == NULL)
    	{
    		printf("Memory reallocation error.\n");
    		return NULL ; 
    	}
    	
    	memcpy(newPtr, oldPtr, oldSize) ;  // copy data over 
    	
    	free(oldPtr) ; 
        
        return newPtr;
    }
     
  16. macrumors 6502a

    Joined:
    Dec 4, 2006
    Location:
    Katy, Texas
    #16
    Here's the code with realloc().

    Todd

    Code:
    #include <stdio.h>
    #include <stdlib.h> 
    
    char * memResize(char *, size_t newSize, size_t oldSize);
    
    int main (int argc, const char * argv[])
    {
        char *buf_one, *current_pointer;
        char c ; 
        size_t memSize = 1; /* Setting to 1 as an arbitary amount - I'm interested in expanding the amount allocated thus 1 is guarenteed to need expanding */
        size_t memUsed = 0 ; 
        
        buf_one = (char *) malloc((memSize)*sizeof(char));
    	
    	if(buf_one == NULL)
    	{	
    		printf("Memory Error.\n");
    		return -1 ; 
    	}
    	
    	current_pointer = buf_one ; 
    	
    	while ( (c = getchar()) != '\n')
    	{
    		*current_pointer++ = c ; 
    		memUsed++ ;  
    		if (memUsed==memSize) { 
    			memSize *= 2 ;         // Double the size 
    			buf_one = memResize(buf_one, memSize, memUsed) ;
    			current_pointer = buf_one + memUsed ; 
    			if (buf_one == NULL) return -1 ; 
    		} 
    		
    	}
        
        printf("%s\n", buf_one);
        
        free(buf_one);
        
        exit(EXIT_SUCCESS);
    }
    
    char * memResize(char * oldPtr, size_t newSize, size_t oldSize)
    {
        char *newPtr ; 
        printf("Getting %d more bytes...\n", newSize) ; 
        newPtr = realloc(oldPtr, newSize);
    	if (newPtr == NULL)
    	{
    		printf("Memory reallocation error.\n");
    		return NULL ; 
    	}
    	
        return newPtr;
    }
    
     
  17. macrumors member

    Joined:
    Jan 18, 2006
    Location:
    Finland
    #17
    Some libraries provide such a function, but more commonly you have to keep track of this yourself.

    Hopefully the following example is useful :)

    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    #define GROWTH_STEP  8  /* really small, used to demonstrate algorithm */
    
    size_t bufsize;
    size_t bufused;
    char*  buf;
    
    static void addch(int c)
    {
      if (bufused >= bufsize)
        {
          bufsize += GROWTH_STEP;
          buf = realloc(buf, bufsize);
        }
      buf[bufused++] = c;
    }
    
    int main(void)
    {
      int c;
    
      while ((c = getchar()) != '\n')
        addch(c);
    
      addch('\0');
      printf("%u/%u=(%s)\n", (unsigned)strlen(buf), (unsigned)bufsize, buf);
      free(buf);
    
      return EXIT_SUCCESS;
    }
    
     
  18. thread starter macrumors 603

    Cromulent

    Joined:
    Oct 2, 2006
    Location:
    The Land of Hope and Glory
    #18
    Wow, thanks for the help guys. I'll have to sit down and go through that when I have a little more free time :).

    I'm getting there, slowly but surely, hopefully I'll have enough knowledge of the fundamentals of C to do something useful soon.
     
  19. macrumors G4

    Joined:
    Jan 5, 2006
    Location:
    Redondo Beach, California
    #19
    Try this

    Code:
    char *buffer;
    buffer = malloc(CHUNK);
    while(...) {
      buffer[i] = getchar();
      i++;
      if ( i > CHUNK)
      buffer = realoc( ...);
    }
    
    count = strlen(buffer);
    
    You fill in the details. but dobe sure to check that malloc and realloc return non-null.
     
  20. macrumors 6502a

    yeroen

    Joined:
    Mar 8, 2007
    Location:
    Cambridge, MA
    #20
    As a general rule, be very careful with realloc. In particular don't do this:

    p = realloc(p,nbytes)

    What if realloc returns null? Then you wind up wiping out your original pointer p. Even if it doesn't return null, you also have to be mindful of updating references to the original malloc'd block should realloc move a chunk of that memory to a new location.
     
  21. macrumors member

    Joined:
    Jan 18, 2006
    Location:
    Finland
    #21
    That's good advice! In any serious project I wrap malloc, realloc etc. in the following style:

    Code:
    void* erealloc(void *aptr, unsigned long nbytes)
    {
      void *p = (void*)realloc(aptr, nbytes);
      if (NULL == p)
        {
          free(aptr);
          abort();
        }
      return p;
    }
    
    Another technique would be to use reallocf.
     

Share This Page