Just 'adding Internet' is not enough.
So the third major IT failure this weekend was at Sainsbury's. They were unable to fulfil online oders, reverting to paper. And it seems that this is not an isolated incident at Sainsbury's, but a recurrence of a problem that happened in March.
Many companies have, and continue to add internet to existing services (just look at the IoT explosion). Or they add services that were innovative 20 years ago. Effectively what they have done is added an interface from existing Pick & Pack systems to an internet web store. Seems to me that the Pick & Pack system, packed it in. Now, the glitzy front end of the web store has changed, the infrastructure upgraded and beefed up as the volume of internet shopping has increased.
Could it be that the Pick & Pack systems was seen as being so reliable that it did not need to be changed? I know from experience that these systems, when they work well, will run and run and run. Change is scary. as a result you get systems that are 15 years old and suddenly no longer able to cope with volumes. So you either upgrade or replace.
In a world of 24x7 always connected it becomes almost impossible to upgrade in the traditional sense. You need to have component systems and software that can be updated and upgraded in real time. Parallel systems need to be operated, when one side can be brought down (at a quiet time) updated and spun up. If it fails, the other side is still functional and the updated side can be re-flashed with the original image.
This goes back to the 2 problems I have been talking about.
Live backup systems need to be at the ready. Don't just wait for a disaster. Prepare for the disaster. Test the WHOLE system, not just parts. No testing in a live environment.
Sainsbury's IT failure (don't call it a glitch) ruins bank holiday food orders - 26 May 2017
Sainsbury's online deliveries cancelled after technical difficulties - 4 Mar 2017
Many companies have, and continue to add internet to existing services (just look at the IoT explosion). Or they add services that were innovative 20 years ago. Effectively what they have done is added an interface from existing Pick & Pack systems to an internet web store. Seems to me that the Pick & Pack system, packed it in. Now, the glitzy front end of the web store has changed, the infrastructure upgraded and beefed up as the volume of internet shopping has increased.
Could it be that the Pick & Pack systems was seen as being so reliable that it did not need to be changed? I know from experience that these systems, when they work well, will run and run and run. Change is scary. as a result you get systems that are 15 years old and suddenly no longer able to cope with volumes. So you either upgrade or replace.
In a world of 24x7 always connected it becomes almost impossible to upgrade in the traditional sense. You need to have component systems and software that can be updated and upgraded in real time. Parallel systems need to be operated, when one side can be brought down (at a quiet time) updated and spun up. If it fails, the other side is still functional and the updated side can be re-flashed with the original image.
This goes back to the 2 problems I have been talking about.
- Change controls need to be applied more rigorously than ever.
- BCP needs to be complete and not only include Disaster Recovery but full Disaster Simulation.
Live backup systems need to be at the ready. Don't just wait for a disaster. Prepare for the disaster. Test the WHOLE system, not just parts. No testing in a live environment.
Sainsbury's IT failure (don't call it a glitch) ruins bank holiday food orders - 26 May 2017
Sainsbury's online deliveries cancelled after technical difficulties - 4 Mar 2017
Comments
Post a Comment