Examples
I'll try to answer all the questions put to me as best I can.
If you can post a sample of the kind of stream data you expect, we may be able to help more.
This is an example of some of the data (It is only the numbered parts that are of interest), which is from a laser scanning survey instrument and therefore the measurements / point positions are definitely of importance, hence the need to retain the precision.
This is the header to the file.
9001
(This is the number of columns of points)
2458
(This is the number of rows of points)
7.960000 -13.215000 -0.011000
(Transformation Parameter)
0.989940 -0.141451 0.003366
(First line of 3x3 matrix)
0.141451 0.989945 0.000144
(second line of 3x3 matrix)
-0.003352 0.000333 0.999994
(third line of 3x3 matrix)
0.989940 -0.141451 0.003366 0
(First line of 4x4 matrix)
0.141451 0.989945 0.000144 0
(second line of 4x4 matrix)
-0.003352 0.000333 0.999994 0
(third line of 4x4 matrix)
7.960000 -13.215000 -0.011000 1
(fourth line of 4x4 matrix)
These are the the point strings (There will be 9001 x 2458 points):
-9.2170000 -2.1182499 -0.0927500 0.0460975 177 189 175
(X, Y, Z, Intensity (0-1), R, G, B (0-255))
-9.2165003 -2.1055000 -0.0882500 0.0445106 179 191 177
-9.2170000 -2.0977499 -0.0802500 0.0510414 180 193 176
-9.2165003 -2.0977499 -0.0900000 0.0482490 177 191 174
-9.2172499 -2.0952499 -0.0795000 0.0500038 179 192 175
-9.2180004 -2.0955000 -0.0827500 0.0497139 179 192 175
-9.2172499 -2.0952499 -0.0860000 0.0493172 178 191 174......
And what were the results when using zip files?
The result of using a .zip on a 198.9MB file was to compress it to 43.8MB. This isn't bad, but I would like to achieve better.
you still haven't provided any real-world numbers for the amount of compression you hope to achieve
I am looking to achieve between 5 -10% of the original size. If there are ways to reduce it to lower than 5 % than great, but I suspect there isn't.
If it takes 1 millisecond too long to send, is it a hard real-time failure? Is it just annoying? What are the actual consequences? Be specific, even if you have to give a specific range.
A longer tansmission period will not affect any engineering process (your 1 millisecond concept) but will affect a human decision making process, and will also affect the next available time slot to send further data, thereby having a knock on effect. Hence, whilst it does not have to be transmitted to the millisecond there are still significant benefits in transmitting it as quickly as possible, although the world will not explode if it is a second late.
How many bits or decimal digits is "decent" precision?
Well this all depends on the type of points being sent. The example given above is a good indication of the decimal size. However, if the points are on a different co-ordinate system the X & Y numbers can be larger, something like 500000.000, 180000.000.
I'm not sure if you're being vague because you haven't thought about these things, or because you're not sure what values to use. But if you're trying to solve a problem using computers, then accuracy and precision are important almost everywhere. Not just in specifying the accuracy and precision of numeric data, but accurately and precisely defining the problem that needs to be solved.
Point well made. I am perhaps not fully aware of the options to produce precision. I would ideally like a double to represent each value, but that would be quite costly size wise I imagine. Floats are the next best option. I also need to reconstruct the file correctly on decompression, with all the spaces and return values correctly placed. I'm not really sure how to achieve this yet. Any suggestions?
The simplest thing is to start with basic C programming, using basic functions like scanf(). That's more than enough to parse numbers from text into binary, at which point you can count the number of bytes, or fwrite() them to a file, or whatever
. This is great. Gives me somewhere to start out and learn a bit more about the binary angle. And it is compatible with the Cocoa, XCode route I want to take. Much appreciated.
I'm unclear about the necessity of Objective-C for this. Is it because you're trying to make a library that can be used by another program? What's the overall strategy, and where does "compress a gazillion numbers" fit into it?
I'm programming in XCode and really only have experience in Obj C, so I was hoping to keep to that track if poss, rather than meander into loads of different languages, as I am doing this as something more of a hobby, and can't commit thousands of hours to learning many languages.
Thanks for the help so far guys, and I do appreciate you trying to get a handle on this to help me out. I'll try to be more thorough in future posts. Thanks.