Matlab interpolation

Discussion in 'Mac Programming' started by dukebound85, Jun 18, 2013.

  1. dukebound85, Jun 18, 2013
    Last edited by a moderator: Jun 18, 2013

    macrumors P6

    dukebound85

    Joined:
    Jul 17, 2005
    Location:
    5045 feet above sea level
    #1
    Having a hard time with this seemingly easy concept

    I have a data array,X, of 1958x36x72
    1958 is the time dimension
    36 is lat dimension
    72 is long dimension

    There are NaNs in both the temporal and spatial sense

    What I have done is set a 50% criterion on the data to keep. In other words, if over the course of the time domain, there is a NaN less than 50% of the time in the lat/long index, I keep the data. If not, I re-write it as a NaN.

    As such, I have a new data array,Y, where values are identical to X if NaNs in the temporal sense are less than 50%. However, the grid spaces where NaNs exist over 50%, they are now all NaNs at that gridpoint.

    Here is what I am trying to achieve:
    For the grid points that meet my threashold, I want to interpolate the data to fill in those missing points that currently have NaNs (only in the arrays that have NaNs less than 50% of the time)

    However, I am not having much luck in terms of coding this up. I am using Matlab

    Here are two recent cases where I have been playing around with this

    1)
    Code:
    time = 1:1:1958;
    temp_anom_crit_interp = interp2(time,temp_anomlay_criterion,time,'spline')
    -------------------------------
    2)
    Code:
    for t= 1:1958
        for i = 1:36
            for j = 1:72
           temp_anom_crit_interp(t,i,j) = interp(temp_anomlay_criterion(t,i,j),r); %have NO idea what to use for r in my case
    end
        end
    end
    
    I feel I am having a hard time grasping understanding which function to use. If anyone has any insight or experience along these lines, it would be greatly helpful:)
     
  2. Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #2
    So this is something like a GPS time track?

    Lat/Long pairs vs. time?

    I've got some related code around here somewhere I think.

    EDIT: Hm. No, I don't get why there are more longs vs. lats.

    Can you explain the meaning of a 1x36x72 sample?

    B
     
  3. thread starter macrumors P6

    dukebound85

    Joined:
    Jul 17, 2005
    Location:
    5045 feet above sea level
    #3
    In a sense, yea

    For instance, in my arrays that meet my threshold, I may have a Nan at time 1 but values in all 1957 other instances and I want to interpolate a value for time 1

    Or I could have Nans at 49% of my array (wrt time) where I would like to interpolate a fit from the 51% I have to "fill" those missing 49% of grid points as a function of time


    so a 1x36x72 sample would be observations for all lat and longs at a specified time.
    the 1 in this array may result to a NaN whereas the pt (1958,36,72) may have a value. I want to take the arrays where I have values of more than 50% of the time (I have this formulated already) to backfill the times where a NaN is present if that makes sense
     
  4. Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #4
    See my edit.

    I don't get why each time instance is 36x72?!?

    Does each element in the array represent a value at a specific latitude and longitude?

    Working out loud.

    Oh I think I get it.

    Is each element a 5x5 degree element on the globe?

    B
     
  5. thread starter macrumors P6

    dukebound85

    Joined:
    Jul 17, 2005
    Location:
    5045 feet above sea level
    #5
    Yup!

    so at each grid point of my 36x72 part of my array, there is a value

    the time dimension of 1958 represents the temporal change of these grid point values

    However, not all grid points have sufficient amount of values by my threshold and are given with a NaN. I have filtered the data so I keep all grid points where I have values for at least 979 time steps

    With these grid point values, I then want to fill in the missing values of these grid points with respect to time
     
  6. Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #6
    IMO, in the end your code should end up looking something like:

    Code:
    for t= 1:1958
           ZI = interp2(X,Y,Z,XI,YI,'cubic');
    end
    X,Y,Z should contain only the non-NaN data, and XI, YI should contain the whole array of 2592 coordinate pairs (unless you want to preserve the measured values for the non-NaNs).

    I won't get a chance to look at this again for a while, but I can certainly take a stab at it if you can share a representative sample of the data e.g. 5x36x72.

    B
     
  7. dukebound85, Jun 18, 2013
    Last edited: Jun 18, 2013

    thread starter macrumors P6

    dukebound85

    Joined:
    Jul 17, 2005
    Location:
    5045 feet above sea level
    #7
    Hmm not sure how to isolate my non-nan parts to get X,Y,Z as I think the index is important when I interpolate. As far as interpolation, I would prefer to keep the values if there and only interpolate the missing ones.

    Thanks for the help and for offering to take a look. I have attached a representative sample. cc is 5 successive time steps late in the period and ccs is early. ccs will have nans at grid points in the time aspect that cc will not. I am trying to backfill ccs time steps at the gridpoints based on the gridpoint values in cc moreorless
     

    Attached Files:

  8. Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #8
    I've had a quick look and scratch my previous thought.

    That was for spatial interpolation where you want to have the values of one pixel be interpolated from the adjacent pixels.

    What I am understanding now is temporal interpolation.

    Here's a concrete example from your data.

    Code:
    >> Z=cc(:,15,16)'
    
    Z =
    
       -4.1588       NaN   -0.2496   -0.6125   -0.0643
    
    >> Zi=interp1(find(~isnan(Z)),Z(~isnan(Z)),1:5)
    
    Zi =
    
       -4.1588   [B][COLOR="Red"]-2.2042[/COLOR][/B]   -0.2496   -0.6125   -0.0643
    
    >> Zis=interp1(find(~isnan(Z)),Z(~isnan(Z)),1:5,'spline')
    
    Zis =
    
       -4.1588   [COLOR="red"][B]-0.8177 [/B][/COLOR]  -0.2496   -0.6125   -0.0643
    
    >
    
    Does this capture what you want to do? replace the NaN with a value derived from the adjacent points?

    Do you want to be able to extrapolate beyond where data exists in time or not?

    B
     
  9. thread starter macrumors P6

    dukebound85

    Joined:
    Jul 17, 2005
    Location:
    5045 feet above sea level
    #9
    In essence, yes. Though I am confused how you achieved different values for Z and Zis. Is that from linear vs spline?

    Just so I understand

    Zi=interp1(find(~isnan(Z)),Z(~isnan(Z)),1:5)

    What this is doing is applying the default interpolation method since no modifier after 1:5. Using the interp1 function, this is finding the index where no NaNs are (find(~isnan(Z))), then taking that index and applying it to the data matrix Z (Z(~isnan(Z))), and interpolating in the time domain (1:5). Is this a correct interpretation?

    -----------------------
    As far as capturing what I want to do...it seems to I think
    If I have a value of a grid point from say time steps 800 to 1958, and NaNs in time steps at that same grid point 1:799, I want to essentially find that function of best fit/interpolation derived from the values at 800 on and apply that function/interpolation to the time steps of 1:799 so I then have values.

    Or another scenarios is say I have NaNs every third time step in my data (as it depends on the grid point essentially), I want to use that same method to fill them in based on recorded values.


    As far as your second question, do you mean filling in the spatial grid that has NaNs based on gridded values that are in place with results? If so, yes, I plan on looking at that aspect as well, in which case I would take a gridded time step, then spatially fill in the NaNs and then do it temporally. Trying to tackle one scenario at a time.

    I assume the approach is similar?

    Thanks for your insight, it is much appreciated:)
     
  10. Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #10
    Exactly. Because the values in 3:5 are so much higher than the value in 1, the interpolated value from the spline will be closer to the plateau values than the initial minimum with a spline.

    Yep. I'm using the indices and values for Z where Z is non NaN to compute the interpolated values for the entire time series. It does look like interp1 preserves the input values, so this is safe.

    So that is extrapolation, and according to a friend of mine it is the worst of perversions we inherited from the Greek mathematicians. The spline will do exactly that, but especially if the case is as extreme as you just said the results you get may not be anywhere near accurate.

    Or another scenarios is say I have NaNs every third time step in my data (as it depends on the grid point essentially), I want to use that same method to fill them in based on recorded values.

    For example, if I remove the first value from the series:

    Code:
    >> Z=[NaN NaN -0.2496   -0.6125   -0.0643];
    >> Zi=interp1(find(~isnan(Z)),Z(~isnan(Z)),1:5)
    
    Zi =
    
                           NaN                       NaN                   -0.2496                   -0.6125                   -0.0643
    
    >> Zs=interp1(find(~isnan(Z)),Z(~isnan(Z)),1:5,'spline')
    
    Zs =
    
                        [COLOR="Red"][B]3.2095                    1.0244 [/B][/COLOR]                  -0.2496                   -0.6125                   -0.0643
    
    The linear interpolation won't go beyond the bounds of existing data, while the spline will. As a spline it can have cubic terms that will grow quite fast if the index changes too much.

    It really depends on what the data is and how it is trending to figure out what the extrapolated values should be.

    Put another way, your error bars for the data at index 1 extrapolated from 800:1958 are huge. Don't read much into those values.

    I was meaning adjacent here in terms of time not space, sorry for not being clear.

    The spatial interpolation could be done with interp2 as I had outlined above, but again you need to be careful about what you want to do when you go beyond the data you have.

    FWIW Have you considered storing this as a sparse matrix? You seem to be right on the edge of where that may make sense, but of course that is zero filled instead of NaN filled.

    B
     
  11. thread starter macrumors P6

    dukebound85

    Joined:
    Jul 17, 2005
    Location:
    5045 feet above sea level
    #11
    Thanks for the insight:) I plan on playing with the time scales to see how realistic the data extrapolates out to be and may nly focus on the latter half of the time series or whatever if need be.

    What do you mean by a sparse matrix? One that grows in size? I'll have to research it some but it may be something I should look into as well.
    -------------------------
    In regards to the temporal extrapolation, I have been running into issues doing it for all points. Any issue why this code wouldn't work? I get this error. tac_interp is a data array I want filled with all the filled in values
    Code:
    for i = 1:36
        for j = 1:72
            z = tac(:,i,j)'; %tac is my criterion data array
            tac_interp(:,i,j)=interp1(find(~isnan(z)),z(~isnan(z)),1:1958,'spline');
        end
    end
    Thanks again
     
  12. Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #12
    It's a different way for storing matrices that deal with matrices that are full of zeroes/empty elements.
    http://www.mathworks.com/help/matlab/ref/sparse.html

    Your samples were < 50% full.

    My guess is this will fix it. You probably have some combinations of i,j that have 0 or 1 elements for 1:1958.

    Code:
    for i = 1:36
        for j = 1:72
            z = tac(:,i,j)'; %tac is my criterion data array
            if numel(~isnan(z)) >= 2
               tac_interp(:,i,j)=interp1(find(~isnan(z)),z(~isnan(z)),1:1958,'spline');
            else
               tac_interp(:,i,j) = z;
            end
        end
    end
     

Share This Page