All posts by Kevin Osborn

About Kevin Osborn

I’m a Maker and ardent amateur educator. I also love to create art, hack electronics and do geeky activities with my kids.

Change of Pace

I haven’t updated this in a while, largely because I really no longer work in IT. I’m doing research now (augmented reality and eddy current NDE research) and find myself wanting to share again, but not so much work stuff, as things I’m interested in and having fun with (which sometimes will include work… we just bought a Kinect, stay tuned!)

So, from this day forth, I declare this blog to be about fun, geeky fun!

Let the posts begin!

USBootin’ goodness

LIve CD’s have been around for a while and are a great way to try out the many different linux distributions. I recently had to find a better linux install for an Asus eee PC, and I could have dug up a USB cdrom drive, but more convenient would be booting from a usb thumb drive. Many distributions have tools to create bootable USB’s, but not all.

Xubuntu’s wiki pointed me at this terrific tool that will create a Live-USB from any distribution. UNetbootin will even download the the iso for many of the standard distros. If the one you are interesteed in isn’t on the list, simply download the iso and use their diskimage option. As an added bonus, there’s both a windows and linux version.

A great tool to add to your toolkit!

Bringing Linux Home

I’ve worked on various flavors of Unix for years, including Linux quite recently in a work production environment. Every year or so, I try the latest “desktop” distribution, and have until this last time, concluded, that it’s just not ready, even for a geek like me.

Just before our vacation, my Dell Windows XP machine almost literally caught on fire. It was hanging regularly, and finally wouldn’t start at all. The CPU power connector was toasted, burned and broke off when I pulled on it.
Panicked, because I had way to much to do to get ready for our trip, I grabbed a machine my neighbor had given me because he could never get it to work, and I installed the latest Fedora distribution.

The install was fast, and neat, and out of the box, did almost everything I needed. I’ve since installed a bunch of other software to fill in the blanks, and I’ve been quite happy with it. An added bonus is that I don’t have to add cygwin, and other crappy unix emulation libraries in order to do development work!

The only things I really miss are Itunes and Photoshop. Managing my iPhone with gtkpod is painful (though it does work, just slow and very finicky), and Gimp simply isn’t as easy to use as Photoshop (I am learning, but there are still lots of things missing.)

My wife has been sharing the same computer, and has had absolutely no problems using it. I believe I can confidently say that Desktop Linux has finally arrived!

BI for Application Scale

Many applications start out with a good idea, and even a good architecture for demonstrating that idea or serving hundreds to thousands of clients. If it’s popular though, you’ll soon have to adopt horizontal scaling ideas.
One of the best ways to scale is to separate Database reads and writes. Sharding is a great way to add additional capacity by placing new read or write shards into load balancing rotation, or geographically closer to clients. Read shards can often replicate in less than realtime, and don’t really present a challenge, except as data grows and you outgrow the alloted storage. As Read or write data gets above a certain size, queries slow, and upgrading all those disks in all those shard servers is expensive and difficult. Also, if writes are only stored in specific write shards (as opposed to replicated network wide). It is difficult to report on business data, and long term, it’s generally not a good idea to store all that historical data in production anyway.
It’s therefore critical to create data aggregation processes to compile and store data important to business intelligence. It’s crucial to note that not all data needs to be siphoned off and stored long term. It’s equally important to decide what data you don’t need to keep and make sure it gets purged periodically from your production shards.
I’ll talk more later about effective ways to do this.

Keeping an Eye on the Business

When running an operations group, it’s important to have monitors and alerts to let you know when things go wrong with your servers and services. When a server crashes, or a hard disk dies, it’s easy to pinpoint the problem and fix it quickly. The problem with complex interacting systems, is that the system can be “down” or non functional from a business point of view, while all the things you are monitoring are showing “all green”

I had the opportunity to talk with scalability gurus and former ebay architects Marty Abbott and Tom Keeven. They talked about the idea of operation teams monitoring business metrics as an indicator of system health.

Here’s an example (not real data from my company!) of week over week graphs of user account signups/hour. You should pick the metrics and sampling rate to be statistically significant and also keep in mind that seasonal variations and even social events can affect some types of web business metrics. They sited a fun example from when they were at ebay, of noticing a significant drop in use of the site at 7pm eastern time on a Monday. It took them a while, but they finally figured out that was when American Idol was on! They ended up installing a TV in their Network operations center.

I’m getting Marty Abbot  and Michael Fisher’s book, “The Art of Scalability” and look forward to reading more great tips like this. Check out their book on Amazon:

Small Business Intelligence

When you ask people about business intelligence, some think it’s  something only big companies have or can afford. Mention of those two words will often provoke stories about how their company spent millions on a data warehouse project that was ultimately a failure.

While it’s true that BI is often not handled well, every business needs it and indeed, has it. Whether it’s a spreadsheet, accounting software reports (such as Quickbooks, or Great Plains), or reports pulled directly from your production databases, most if not all businesses need to look at measures of how they are doing.

There are definite advantages to being more systematic about it though. Versions of spreadsheets can get lost or confused. Access databases are prone to corruption and can’t handle large volumes of data. By looking at single silos of information (accounting, web traffic, etc.)  you run the risk of missing connections and seeing the big picture.

Lots of companies will offer you Business Intelligence and Data Warehousing solutions. Most of us have heard of multi-million dollar data warehouse projects gone sour. It doesn’t have to be that way, and you can get there from wherever you are.

Going Open Source

There are a lot of BI solutions out there, and some are good and easy, and others are good, but complex and require a lot of expertise to implement properly. Most of them are very very expensive. We decided to evaluate, and then implement on an open source BI solution called Pentaho. In upcoming posts, you’ll see some tips and impressions. We also opted for support, but I have to say about half of our problems have been solved by googling. The documentation is a bit incomplete but getting better. I’d say it’s a good choice when you can’t afford a more packaged solution, and it’s also good if you want to get into the guts and implement something truly unique.

What you get (for free!)

  • Pentaho Data Integration (PDI) – This is an ETL workflow tool similar to SSIS. It actually has several more useful modules out of the box, and easily talks to multiple types of database systems. We find it useful for pulling data from a SQL server system into a MySQL business intelligence database.
  • Pentaho Report server
  • Pentaho Design studio  and Metadata editor (for building data models)
  • There are also Mondrian analysis tools, and Weka data mining

The enterprise ($) version adds an interactive analyzer tool, dashboards, and several other things.

Advantages

  • Opensource/open scripting. All components in the system are written in Java, and source code is available for most of them, including several third party extensions. The Data integration flows are very flexible because they include Javascript scripting modules. I think Javascript is easier to learn than C# (comparing to SSIS)
  • Connectivity to multiple databases. While you can configure ODBC connections for other db’s in SSIS, it’s finicky, and Pentaho includes models for most of them (dialogs that you fill in with connection specific properties.)
  • Works with open standards for mail, etc. for easy enterprise integration.
  • Cross platform. This is especially cool, as you can develop on windows or mac laptops,  and deploy on Unix servers.
  • Free! (well most of it.)
  • google-able support.  There’s a fairly large population of users who post in forums and on blogs like this one.

Disadvantages

  • It’s a bit of a fast moving train. It takes some discipline to settle on a specific version, especially when you see new features appearing all the time. Unlike commercial product where there are 1 or 2 releases a year, this moves fast. The supported enterprise version is slower, and better QA’d.
  • New features often aren’t QA’d well. It’s best to give them bake time, or try them (and give feedback) but don’t deploy to production until they are baked.
  • It’s easier to get help on new features, than when you run into a problem with old. The community seems to have “moved on” including the paid support folks.
  • Performance optimization is uneven. There are some features that are really efficient and fast, and others are, well, not.  This is probably a function of “baked-ness” and how much that particular feature is used.  For some functions like insert-update, it’s better to do it in sql anyway.
  • Some functions aren’t free. There’s a really cool analyzer mode that’s available in the enterprise version. It’s still cheaper than anything else out there, so this isn’t really a disadvantage. It’s just if you need to cheap out, be aware, that you can’t have it all!

That’s all for now. I’ll publish some specific tips in coming weeks.

Ada Lovelace Day: Honoring Women in Science and enginnering

Today, March 24 is Ada Lovelace day, and I pledged to blog about a woman in science or engineering that had a profound impact on me. Ada Lovelace was a protege of Charles Babbage who wrote about his work on the “Difference Engine” widely recognized as the first computer (purely mechanical) including a set of notes in her appendix that is recognized as the first computer program. She also speculated on uses of computers far outside numeric computation, including the composition of music.

There are a lot of contemporary women who are currently inspiring me, including hackers Limor Fried, Jeri Ellsworth, and Lenore Edman, but my thoughts turned to a much earlier influence, my High School chemistry teacher, Dr. Donna Bogner.

Dr. Bogner was one of the best teachers I have ever had. Constantly looking for new ways to inspire kids, tolerant of our explorations (and explosions!) letting us discover things for ourselves, even when it meant taking some risks, as long as we observed safety precautions.

She went back to school on her own initiative to learn computer programming so she could add that to the High School curriculum, at a time when the nearest computers were a couple blocks away at the Junior College. The initiative, curiosity, and “just try it” attitude I learned from her has served me well, both in my career and as a dad, and school science volunteer.

Since I graduated from Hutchinson High, Dr. Bogner taught freshman chem at Wichita State for 15 years and  spent some time at Princeton as a Woodrow Wilson Fellow. This led to consulting for Exxon and Pfizer, all around developing science materials for the secondary classroom. She did 5-6 summer workshops a year for the Dreyfus institute all over the country.

Dr. Bogner is still active in reaching young minds in what she describes as her “fifth retirement” developing science curriculum at  Mid-Continent Research for  Education and Learning. Among other things, she translates technical details from  NASA’s extraterrestrial missions and Nano-chemistry  into materials for classroom use. Particularly interesting to me, she also adapts these materials into forms that are accessible to visually impaired students which she understands because of her own visual impairment.

I just wanted to say WOW! and thank you, for all your inspiring work, Dr. Bogner, and hope some other girls will read this and be inspired to follow in your footsteps!

Archiving Data from Massive Production Tables

Just a quick tip.

Data is valuable, but as it grows in your production system, it can slow everything down.

If you have data that has grown to millions of rows, and decide to archive it off your servers, and you have replication/sharding going, selecting into archive tables and deleting millions of rows can cause your replication to back up.

Another strategy, though requiring downtime, is quicker:

  1. Make sure nothing is updating the table you are archiving.
  2. Select the amount of data you want to KEEP into an equivalent schema table with equivalent indexing.
  3. rename the original table to indicate it’s oldness.
  4. rename the new table to the original table name
  5. Turn your updates back on.

Then, going forward you can use the siphoning off approach for smaller chunks of data that fall off the end of your window (30, 90 days, whatever.)

Group Collaboration with Google Wave

Wiki’s are great, but even with technical people, the special markup is a bit of a pain, and during the early phases of group collaboration, it’s really useful to maintain some idea of who’s promoting what idea.

Catch the Wave

When Google Wave was introduced, it seemed ideal. A cross between wiki, threaded discussion and email, with lots of rich embedding and an API!

If you are not familiar with it, there is a terrific overview here: http://mashable.com/2009/05/28/google-wave-guide/

I decided to use it to collaborate with my senior tech leads on several infrastructure projects for our company. Now this turns into a bit of a rant, so I want to say up front that I really like Wave, and that it fits my style of thinking/collaborating perfectly. It does have some serious flaws that I hope google will address in short order.

Invites – scarce? No, Pain yes.

Google Wave is in Preview, so they are limiting the signups through a bit of a clunky invite system. Once one of your crew is in, the invites aren’t that scarce, as each of those people get some fairly generous number of invites.

The invites aren’t approved instantly (with a really annoying message about having a lot of stamps to lick.) So if I want to share a document/wave I’m working on with someone, I can either make it public (not so good for company proprietary documents) or:

  1. Get the person’s google account (assuming they have one)
  2. Send the invite
  3. Tell the person to send you their google wave ID when they eventually get and approve the invite
  4. Add them to the wave.

Yikes, If I have trouble getting them to use the Wiki, how am I going to get them to embrace this?

Google buzz, introduced yesterday uses the same account info as GMail, so instead of using your GMail to get a special google wave address. Hopefully wave will adopt a similar authentication model.

Security Model

I covered authentication, what about authorization? Here too, it’s pretty primitive. From what I can tell there are three basic Roles, and they only apply to individuals, not groups:

  • Editor – can change the wave
  • Reader – can read the wave
  • Public – available to anyone, and can be embedded.

If I’m working on a wave with a group, I have to add each individual, and manage that for every wave I’m involved in. Tedious at best and dangerous at worst. Here, another pet peeve of mine is activated. The participants are represented by their profile icons, and if they don’t have a picture, they all look the same! You can mouse over, but it’s really hard to see if everyone you want is included, and you haven’t added someone from outside work by mistake!

There is a workaround using google groups but again, another step, and another system to maintain.

I really do like Wave, and will continue to use it. I’m sure if it starts to gain traction, Google will apply appropriate resources, and fix many of these limitations. While Buzz fixes some of these problems, it’s really not for the same purpose: archival collaboration. Join me in encouraging Google to continue to develop this valuable tool!


Welcome to Bald Wisdom!

When looking for a domain name for my IT career related blog, I came up with Bald Wisdom. I’m not trying to say that I’m wise, though I certainly am bald! It’s more about seeking wisdom, and sharing any little tidbits I’ve learned.

While I anticipate most articles will be technical in nature, I imagine I will wander into management and leadership territory once in a while, as getting things done technically requires organizational support and the efforts of good team members.