PDA

View Full Version : Matlab interpolation




dukebound85
Jun 18, 2013, 04:26 PM
Having a hard time with this seemingly easy concept

I have a data array,X, of 1958x36x72
1958 is the time dimension
36 is lat dimension
72 is long dimension

There are NaNs in both the temporal and spatial sense

What I have done is set a 50% criterion on the data to keep. In other words, if over the course of the time domain, there is a NaN less than 50% of the time in the lat/long index, I keep the data. If not, I re-write it as a NaN.

As such, I have a new data array,Y, where values are identical to X if NaNs in the temporal sense are less than 50%. However, the grid spaces where NaNs exist over 50%, they are now all NaNs at that gridpoint.

Here is what I am trying to achieve:
For the grid points that meet my threashold, I want to interpolate the data to fill in those missing points that currently have NaNs (only in the arrays that have NaNs less than 50% of the time)

However, I am not having much luck in terms of coding this up. I am using Matlab

Here are two recent cases where I have been playing around with this

1)
time = 1:1:1958;
temp_anom_crit_interp = interp2(time,temp_anomlay_criterion,time,'spline')
-------------------------------
2)
for t= 1:1958
for i = 1:36
for j = 1:72
temp_anom_crit_interp(t,i,j) = interp(temp_anomlay_criterion(t,i,j),r); %have NO idea what to use for r in my case
end
end
end


I feel I am having a hard time grasping understanding which function to use. If anyone has any insight or experience along these lines, it would be greatly helpful:)



balamw
Jun 18, 2013, 04:33 PM
So this is something like a GPS time track?

Lat/Long pairs vs. time?

I've got some related code around here somewhere I think.

EDIT: Hm. No, I don't get why there are more longs vs. lats.

Can you explain the meaning of a 1x36x72 sample?

B

dukebound85
Jun 18, 2013, 04:37 PM
So this is something like a GPS time track?

Lat/Long pairs vs. time?

I've got some related code around here somewhere I think.

B

In a sense, yea

For instance, in my arrays that meet my threshold, I may have a Nan at time 1 but values in all 1957 other instances and I want to interpolate a value for time 1

Or I could have Nans at 49% of my array (wrt time) where I would like to interpolate a fit from the 51% I have to "fill" those missing 49% of grid points as a function of time


so a 1x36x72 sample would be observations for all lat and longs at a specified time.
the 1 in this array may result to a NaN whereas the pt (1958,36,72) may have a value. I want to take the arrays where I have values of more than 50% of the time (I have this formulated already) to backfill the times where a NaN is present if that makes sense

balamw
Jun 18, 2013, 04:44 PM
See my edit.

I don't get why each time instance is 36x72?!?

Does each element in the array represent a value at a specific latitude and longitude?

Working out loud.

Oh I think I get it.

Is each element a 5x5 degree element on the globe?

B

dukebound85
Jun 18, 2013, 04:47 PM
See my edit.

I don't get why each time instance is 36x72?!?

Does each element in the array represent a value at a specific latitude and longitude?

Working out loud.

Oh I think I get it.

Is each element a 5x5 degree element on the globe?

B

Yup!

so at each grid point of my 36x72 part of my array, there is a value

the time dimension of 1958 represents the temporal change of these grid point values

However, not all grid points have sufficient amount of values by my threshold and are given with a NaN. I have filtered the data so I keep all grid points where I have values for at least 979 time steps

With these grid point values, I then want to fill in the missing values of these grid points with respect to time

balamw
Jun 18, 2013, 04:55 PM
IMO, in the end your code should end up looking something like:

for t= 1:1958
ZI = interp2(X,Y,Z,XI,YI,'cubic');
end

X,Y,Z should contain only the non-NaN data, and XI, YI should contain the whole array of 2592 coordinate pairs (unless you want to preserve the measured values for the non-NaNs).

I won't get a chance to look at this again for a while, but I can certainly take a stab at it if you can share a representative sample of the data e.g. 5x36x72.

B

dukebound85
Jun 18, 2013, 05:03 PM
IMO, in the end your code should end up looking something like:

for t= 1:1958
ZI = interp2(X,Y,Z,XI,YI,'cubic');
end

X,Y,Z should contain only the non-NaN data, and XI, YI should contain the whole array of 2592 coordinate pairs (unless you want to preserve the measured values for the non-NaNs).

I won't get a chance to look at this again for a while, but I can certainly take a stab at it if you can share a representative sample of the data e.g. 5x36x72.

B
Hmm not sure how to isolate my non-nan parts to get X,Y,Z as I think the index is important when I interpolate. As far as interpolation, I would prefer to keep the values if there and only interpolate the missing ones.

Thanks for the help and for offering to take a look. I have attached a representative sample. cc is 5 successive time steps late in the period and ccs is early. ccs will have nans at grid points in the time aspect that cc will not. I am trying to backfill ccs time steps at the gridpoints based on the gridpoint values in cc moreorless

balamw
Jun 18, 2013, 05:59 PM
I've had a quick look and scratch my previous thought.

That was for spatial interpolation where you want to have the values of one pixel be interpolated from the adjacent pixels.

What I am understanding now is temporal interpolation.

Here's a concrete example from your data.


>> Z=cc(:,15,16)'

Z =

-4.1588 NaN -0.2496 -0.6125 -0.0643

>> Zi=interp1(find(~isnan(Z)),Z(~isnan(Z)),1:5)

Zi =

-4.1588 -2.2042 -0.2496 -0.6125 -0.0643

>> Zis=interp1(find(~isnan(Z)),Z(~isnan(Z)),1:5,'spline')

Zis =

-4.1588 -0.8177 -0.2496 -0.6125 -0.0643

>


Does this capture what you want to do? replace the NaN with a value derived from the adjacent points?

Do you want to be able to extrapolate beyond where data exists in time or not?

B

dukebound85
Jun 19, 2013, 01:31 AM
I've had a quick look and scratch my previous thought.

That was for spatial interpolation where you want to have the values of one pixel be interpolated from the adjacent pixels.

What I am understanding now is temporal interpolation.

Here's a concrete example from your data.


>> Z=cc(:,15,16)'

Z =

-4.1588 NaN -0.2496 -0.6125 -0.0643

>> Zi=interp1(find(~isnan(Z)),Z(~isnan(Z)),1:5)

Zi =

-4.1588 -2.2042 -0.2496 -0.6125 -0.0643

>> Zis=interp1(find(~isnan(Z)),Z(~isnan(Z)),1:5,'spline')

Zis =

-4.1588 -0.8177 -0.2496 -0.6125 -0.0643

>


Does this capture what you want to do? replace the NaN with a value derived from the adjacent points?

Do you want to be able to extrapolate beyond where data exists in time or not?

B
In essence, yes. Though I am confused how you achieved different values for Z and Zis. Is that from linear vs spline?

Just so I understand

Zi=interp1(find(~isnan(Z)),Z(~isnan(Z)),1:5)

What this is doing is applying the default interpolation method since no modifier after 1:5. Using the interp1 function, this is finding the index where no NaNs are (find(~isnan(Z))), then taking that index and applying it to the data matrix Z (Z(~isnan(Z))), and interpolating in the time domain (1:5). Is this a correct interpretation?

-----------------------
As far as capturing what I want to do...it seems to I think
If I have a value of a grid point from say time steps 800 to 1958, and NaNs in time steps at that same grid point 1:799, I want to essentially find that function of best fit/interpolation derived from the values at 800 on and apply that function/interpolation to the time steps of 1:799 so I then have values.

Or another scenarios is say I have NaNs every third time step in my data (as it depends on the grid point essentially), I want to use that same method to fill them in based on recorded values.


As far as your second question, do you mean filling in the spatial grid that has NaNs based on gridded values that are in place with results? If so, yes, I plan on looking at that aspect as well, in which case I would take a gridded time step, then spatially fill in the NaNs and then do it temporally. Trying to tackle one scenario at a time.

I assume the approach is similar?

Thanks for your insight, it is much appreciated:)

balamw
Jun 19, 2013, 07:05 AM
In essence, yes. Though I am confused how you achieved different values for Z and Zis. Is that from linear vs spline?


Exactly. Because the values in 3:5 are so much higher than the value in 1, the interpolated value from the spline will be closer to the plateau values than the initial minimum with a spline.


Is this a correct interpretation?


Yep. I'm using the indices and values for Z where Z is non NaN to compute the interpolated values for the entire time series. It does look like interp1 preserves the input values, so this is safe.


As far as capturing what I want to do...it seems to I think
If I have a value of a grid point from say time steps 800 to 1958, and NaNs in time steps at that same grid point 1:799, I want to essentially find that function of best fit/interpolation derived from the values at 800 on and apply that function/interpolation to the time steps of 1:799 so I then have values.


So that is extrapolation, and according to a friend of mine it is the worst of perversions we inherited from the Greek mathematicians. The spline will do exactly that, but especially if the case is as extreme as you just said the results you get may not be anywhere near accurate.

Or another scenarios is say I have NaNs every third time step in my data (as it depends on the grid point essentially), I want to use that same method to fill them in based on recorded values.

For example, if I remove the first value from the series:

>> Z=[NaN NaN -0.2496 -0.6125 -0.0643];
>> Zi=interp1(find(~isnan(Z)),Z(~isnan(Z)),1:5)

Zi =

NaN NaN -0.2496 -0.6125 -0.0643

>> Zs=interp1(find(~isnan(Z)),Z(~isnan(Z)),1:5,'spline')

Zs =

3.2095 1.0244 -0.2496 -0.6125 -0.0643


The linear interpolation won't go beyond the bounds of existing data, while the spline will. As a spline it can have cubic terms that will grow quite fast if the index changes too much.

It really depends on what the data is and how it is trending to figure out what the extrapolated values should be.

Put another way, your error bars for the data at index 1 extrapolated from 800:1958 are huge. Don't read much into those values.


As far as your second question, do you mean filling in the spatial grid that has NaNs based on gridded values that are in place with results? If so, yes, I plan on looking at that aspect as well, in which case I would take a gridded time step, then spatially fill in the NaNs and then do it temporally.


I was meaning adjacent here in terms of time not space, sorry for not being clear.

The spatial interpolation could be done with interp2 as I had outlined above, but again you need to be careful about what you want to do when you go beyond the data you have.

FWIW Have you considered storing this as a sparse matrix? You seem to be right on the edge of where that may make sense, but of course that is zero filled instead of NaN filled.

B

dukebound85
Jun 20, 2013, 01:58 PM
Thanks for the insight:) I plan on playing with the time scales to see how realistic the data extrapolates out to be and may nly focus on the latter half of the time series or whatever if need be.

What do you mean by a sparse matrix? One that grows in size? I'll have to research it some but it may be something I should look into as well.
-------------------------
In regards to the temporal extrapolation, I have been running into issues doing it for all points. Any issue why this code wouldn't work? I get this error. tac_interp is a data array I want filled with all the filled in values
Error using ==> interp1 at 185
There should be at least two data points.

for i = 1:36
for j = 1:72
z = tac(:,i,j)'; %tac is my criterion data array
tac_interp(:,i,j)=interp1(find(~isnan(z)),z(~isnan(z)),1:1958,'spline');
end
end

Thanks again

balamw
Jun 20, 2013, 07:14 PM
What do you mean by a sparse matrix? One that grows in size? I'll have to research it some but it may be something I should look into as well.

It's a different way for storing matrices that deal with matrices that are full of zeroes/empty elements.
http://www.mathworks.com/help/matlab/ref/sparse.html

Your samples were < 50% full.


Any issue why this code wouldn't work? I get this error. tac_interp is a data array I want filled with all the filled in values


My guess is this will fix it. You probably have some combinations of i,j that have 0 or 1 elements for 1:1958.

for i = 1:36
for j = 1:72
z = tac(:,i,j)'; %tac is my criterion data array
if numel(~isnan(z)) >= 2
tac_interp(:,i,j)=interp1(find(~isnan(z)),z(~isnan(z)),1:1958,'spline');
else
tac_interp(:,i,j) = z;
end
end
end