A massive meltdown of the college’s data servers in July that caused chaos for students attempting to register for classes is still bedeviling a number of college employees. Fiscal services technicians in particular have been slammed with Herculean data recovery tasks.
SWC IT Director Dan Borges wrote a report describing a data-loss with reverberations that are still being felt throughout the campus. Borges wrote: “No one was expecting to 1) endure a complete thermal shutdown event and 2) for the College to lose all their (sic) data, but that is exactly what happened on Tuesday night the 14th of July.”
“I would consider this to be a real severe technological disaster that you normally don’t see,” he said. “The worst by far in my 30-year career. I’ve heard about or read about stuff like this, but it is not a normal thing.”
Early on the first day of registration for fall semester, the phones in the Student Service Center began to ring with anxious and perplexed students demanding answers.
Admissions technician Percival Concha said it was an unexpected frenzy.
“It was really chaotic,” he said. “Students were calling in all day asking a lot of questions and we didn’t really know what was happening that first day.”
SWC was experiencing a total shutdown of its data servers caused by overheating. A construction crew was installing a new ventilation system in the room that houses the servers. It is a smaller scale version of a space resembling those famed IBM commercials, replete with taller-than-your-average-person towers of data, equipped with massive computing power and blinking lights. During the transition period, a temporary ventilation system was installed to keep the room cool. On first night after the temporary HVAC unit was hooked up, it failed. It malfunctioned at 10 p.m. By 2 a.m. the temperature in the room was 130 degrees. At one point the temperature rose one degree per minute.
At 5:30 a.m. IT supervisor Everett Garnick reported for work and was the first to discover the electronic mayhem. Borges said the gravity of the situation slowly became evident.
“At first the team was optimistic,” he said. “We thought, ‘Oh, it’ll only be a couple hours.’ But right around 10 or 11 a.m. I could see the glazed look in their eyes when you realize this is going to be a serious problem and the recovery was going to be much more difficult than the equipment touts when we buy it.”
The result was 100 percent data loss for SWC. SWC President Dr. Melinda Nish said her immediate reaction was not fit for FCC airwaves.
“When I first heard about it, well maybe I can’t say those words on tape,” she said. “My initial reaction was that this was the worst time this could happen because we were just going into registration and fiscal independence.”
Borges said that due to the complexity of the network it is difficult to pinpoint why the data became corrupted and was ultimately lost.
“We can only surmise certain things, the end result was that all the data was corrupted and there was no recovering all of it,” he said. “What we found was, we have four shelves of discs. They have an auto shut off mechanism if it gets too hot. For whatever reason instead of them all shutting down, three went down and one stayed up and was trying to communicate with the other three. In doing so it may have become corrupted and then corrupted the other ones on the reboot.”
IT supervisors Al Garret and Cliff Sharp worked for 24 straight hours following the meltdown to help mitigate some of the damage. Even so, the campus-wide impact was extensive and indiscriminate, Borges said, affecting almost every department.
Counselor David Ramirez said he and colleagues received some very bad news.
“A year’s worth of counselor notes were lost,” he said. “Every time a student meets with a counselor, notes are taken to help keep track of the student’s progress. We got completely blindsided. There was more shock and disbelief than anything. The thought was that in this day and age, there would be a backup for all this information.”
Borges’ report stated that the back-up system had “outdated software” and “old hardware,” which had been brought to the attention of Nish and other college administrators multiple times in the past. Borges said administrators rolled the dice.
“In the report, IT took responsibility, it’s true we needed resources,” he said. “The risks were known. The district accepted those risks, just like you do anytime you have to make that decision. A lot of times you get lucky and you never have to pay the price. This time we paid the piper. I think it’s unfortunate and there are still reverberations.”
Nish said the college is determined to learn from this event.
“There were different updates being done on different parts of the system,” she said. “Now it’s not gambling. We know exactly when we are updating everything. When we had to restore data, some of it was easy to restore and some of it was not because the backups weren’t being done on a regular basis. That was because of a long history of certain programs being installed and not having a protocol of what the backup was. So that’s not the case anymore.”
Borges said fiscal services took the worst blow. Director of Finance Wayne Yanda said the meltdown has severely tested his team.
“We did get hit pretty hard,” he said. “The two biggest areas that were hit were the food service operations as well as the ASO and the book store operations. Those three operations were not backed-up regularly, so we lost essentially 14 months of data that we had to re-enter manually over the past three months and we had to do that at the same time we were preparing for our year-end audit. We were very successful at having people work overtime and working double time just trying to get everything re-entered.”
Account technicians Betty Keys and Kim Hoang-Nguyen have handled the brunt of the manual data entry working 65-70 hour weeks for the last two months.
Keys declined to discuss her workload. With tears in her eyes, she said that it had been very trying and that she was too close to the issue to talk about it. Hoang-Nguyen also declined to talk about the extra workload, stating that she was simply too swamped with work to spare time, but did say that it had been very difficult.
Yanda said he is proud of the way his staff has rallied.
“We are blessed with a very great staff,” he said. “We pulled together as a team and we are expecting a very clean audit for fifth year in a row.”
Borges said the finance department and other campus personnel came together to help the college recover from the fiasco. The “IT Disaster and Recovery” report was part of that effort to own up to mistakes and document what went wrong to help the college avoid similar pitfalls in the future, said Borges. Nish said Borges and his IT team have recuperated nicely.
“It’s always better to learn the easy way and not the hard way,” she said. “I wish it hadn’t happened, but at the end of the day I was pretty pleased with the way our IT staff was able to marshal its resources and get things done. I’m very pleased with how they documented their recovery so that we know exactly what was done, and what were the trouble spots.”
In the wake of the meltdown the college is spending $725,000 in an effort to reduce future vulnerability. Among the big tickets items is a new storage area network (SAN, the main storage platform where all data is stored) that retails for $ 1.1 million. Borges said the school is getting it for $497,000.
Borges said the new equipment improves the college’s ability to recover.
“Once we implement what we are doing, we are going to put SWC in the top 10 of California community colleges,” he said. “After we install everything if we had the same disaster we could probably restore everything in two days.”
Borges said the report and technology improvements impressed the accreditation team.
“In fact, IT got a commendation if you can believe it,” he said. “That just means they noticed something positive. I was shocked because you’d think that wouldn’t happen after something like this.”
Borges said the whole fiasco proved to be a positive.
“People can throw stones and cast blame if they want,” he said. “For me it’s about finding solutions. At times it felt like the weight of the college rested on my shoulders, but having the new SAN and all of the equipment helps me sleep better at night.”