Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

mysterytramp

macrumors 65816
Original poster
Jul 17, 2008
1,334
4
Maryland
This is not a programming question. It's not really a Macintosh question either, but I'm posting here because I know a lot of smart people hang out in this forum. (Apologies to mods in advance.)

My company needs to access large databases (sometimes in the realm of multi-gigabyte text-based data) mostly from government agencies. We've muddled through with our standard hardware (Mac laptops mostly) and software (Excel), but it's obvious we need to make some changes to go to the next level. Excel is choking on our latest batches of data.

I'm thinking the cheapest solution (and unfortunately, cheap could be a deciding factor) would be a Linux box with open source software. We have some people who are reasonably smart handling most software but none of us is prepared to devote much time on the steep learning curve that SQL appears to be. In other words, we can handle creating a series of fields to describe each record. We might not have the time to devote learning a new query language.

And the primary function would likely be that a handful of database "gurus" would get the raw data, then spit out relevant subsets to colleagues, who would then make sense of the data.

So question one, is Linux the way to go?

And question two, what software makes the most sense in this scenario?

thanks in advance,

mt
 
Linux is better in the way of flexibility and it's easy to use; while other system such as Mac OS X (I personally recommend it), has got a wider variety of software, but it's expensive. I have already advised you, so you decide ;)
:rolleyes:
Pleased to help you.
 
I think mysql or postgresql on Linux is a fine place to keep your data. The more alarming piece is that no one wants to take this very good opportunity to learn SQL. Whatever it is you're doing seems like it would definitely benefit from sane use of a relational database. Maybe you could pilot 1 or 2 analysts learning SQL before you make a big commitment and train everyone?

-Lee
 
If cheap could be a primary factor, you need to take system administration into account. If nobody knows Linux admin or maintenance, you'll either have to train someone to do it (training expense), or hire someone (salary expense).

The unbreakable rule of maintenance is that if you don't plan for maintenance, the lack of it will eventually kill you. Even maintenance-free things aren't really maintenance-free. At best they're self-maintaining because of certain design and production choices that were made. Everything breaks down, so you must plan for it. It's like disaster planning. If you don't really have a disaster plan, then your real disaster plan is to lose everything and/or die in the disaster.


On the SQL question, you need to characterize the operations you're performing on your data.

Basic SQL isn't that hard. There are a few basic concepts, such as the role of a table, what a join is, etc. If the intended operations aren't complex, then it seems possible to me that basic SQL skills might suffice for most things. It's also possible that basic but inefficient SQL that an expert would be shocked to see could well be better and more effective than what you're using now. It's all relative.
 
The more alarming piece is that no one wants to take this very good opportunity to learn SQL. Whatever it is you're doing seems like it would definitely benefit from sane use of a relational database.

The perception is -- and I admit we could be wrong -- is that even though the few that would be working this project have fairly strong spreadsheeting skills, they won't apply directly to SQL, thus requiring all other projects to stop in their tracks while one or more of us get up to speed.

The perception is -- again, possibly wrong -- is that we'll go further, faster with something more like dBase.

mt
 
The perception is -- and I admit we could be wrong -- is that even though the few that would be working this project have fairly strong spreadsheeting skills, they won't apply directly to SQL, thus requiring all other projects to stop in their tracks while one or more of us get up to speed.

The perception is -- again, possibly wrong -- is that we'll go further, faster with something more like dBase.

mt

I think if you're getting big enough projects that your current workflow isn't working, the price must be paid to get a new workflow in place. It would be great if every problem fit into the framework laid out up until now, but it sounds like they don't anymore. Opening multi-gigabyte text files in Excel is no way to live.

You certainly don't have to have dedicated hardware or use Linux for a first crack. I see no reason you couldn't get the DB going on an existing desktop machine running OS X to test it out. If the query volume is very low (like a few analysts running a few hundred queries a day each) you're not going to need tons of resources for this.

I think a very small subset of SQL will get you to the level of proficiency you have in Excel. If you're writing very complex macros it might be a little bit tougher, but SQL is meant for doing what it sounds like you're doing. I think the hardest part of the process is going to be getting your data into the database, the actual querying to get the data out should be fairly simple. Can you get a SQL dump of the data instead of a big text file? If so, it might really simplify the process for you.

If you're too busy for a few people to train up it means that you're going to or have hit a ceiling on what sort of projects you can complete. Making the investment will greatly increase your capabilities and subsequently your value. I think the return on investment would be huge and you'd probably quickly be able to make up the "lost time" on training with your resulting improved workflow. Will a few people be "stopped in their tracks" to get their? It depends on their time management skills, I suppose, but I think it would be worth it in the end.

-Lee
 
So how do you recommend for learning SQL? Does one book stand out?

mt

I found Beginning Databases with PostgreSQL to be an excellent introduction. I had no problems working through it on the Mac (although the initial setup of the Postgres user account from the terminal was a bit tricky I think, I can help with that if you need it).
 
As an Amazon Associate, MacRumors earns a commission from qualifying purchases made through links in this post.
I started with PostgreSQL and honestly learned from the excellent manual.
http://www.postgresql.org/docs/9.1/static/index.html

This has an unfortunate side-effect, which is that you may not always know what is standard and what is Postgres-specific. I've used MS-SQL since, and will be starting with Oracle soon. So far I've been fine adapting to the slight differences.

-Lee
 
lee1210 and GorillaPaws:

Update:

PostGre 9.0.4 installer hung on my Mac. And the Readme file with the installer called for changing memory usage. I'm not terribly intimidated by that, but there's a good chance I'll have to get this running on a Mac 3,000 miles away over the phone, and I have no idea of this other person's expertise.

So I switched to MySQL and got it installed. And thanks to Border's going-out-of-business sale, I found an SQL primer. I'm about halfway through; SQL itself isn't nearly as intimidating as I perceived.

I have some issues however "logging into" MySQL. I've been able to sign on as the root user via the command line, through su and I've been able to work through the assignments. However, GUI-based SQL software (Navicat and MySQL Workbench) seem to have issues with my accounts.

I don't believe signing on as root is terribly smart. But I've never paid much attention to the user structure of OS X. This machine (and its predecessor) log on as me constantly even though it's shared by four people in the family. I'm wondering if that's my issue.

If I try to access a database without logging in as root, I get ERROR 1044. I've googled the error and there's plenty of advice on how to proceed but it all seems to conflict.

Sorry to be coming to you guys ...

mt
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.