United Kingdom Parliament
Publications & records
Advanced search
 HansardArchivesResearchHOC PublicationsHOL PublicationsCommittees
Select Committee on Health Written Evidence


Further supplementary evidence submitted by the Department of Health (EPR 01D)

  Questions were raised during the 14 June 2007 hearing about whether evidence existed for the levels of service availability provided by suppliers under the National Programme, and about the level of resilience provided to withstand significant system failure, and to maintain service to the end user.

  I undertook to provide a note, specifically on details of the latter. I have had the attached note prepared, which I believe fully covers both these issues.

NOTE ON NPFIT SERVICE AVAILABILITY AND RESILIENCE

1.   System Availability

  Q629 Mr Campbell: ... "we have been told that when clinical records are remotely hosted, the loss of the hosting centre or the network for more than a few minutes could lead to loss of life. So both the hosting and the network need to be available virtually all of the time. Is there any evidence of this?"

  The systems provided by CFH are monitored and maintained by the relevant suppliers 24 hours a day 7 days a week to ensure that any incident is detected and the appropriate measures are taken to ensure all services are available to the end users.

  Service availability statistics can be viewed on the Connecting for Health public facing web site www.connectingforhealth.nhs.uk. The Statistics section within the Newsroom tab provides information on service availability and service level achievements for National Application Services ( Choose&Book, N3, NHS Care Records Service (NCRS), Connecting For Health (CFH) Service Desk, NHSmail ) and Local Service Provider (LSP) application services (eg. Picture Archiving & Communications Systems (PACS Digital Imaging) Radiology Information Systems (RIS), Patient Administration Systems (PAS) etc).

  This briefing summarises the levels of system availability and the performance against agreed system availability targets (Service Level Agreements—SLAs) with each supplier.

  In line with normal industrial practice, where incidents do occur these are classified according to their impact on the business and the users and are classified as follows:

Severity 1:

  A Severity 1 service failure is a failure which, in the reasonable opinion of NHS Connecting for Health, the contractor, or a National Health Service system/service user has the potential to:

    —  have a significant adverse impact on the provision of the service to a large number of users; or

    —  have a significant adverse impact on the delivery of patient care to a large number of patients; or

    —  cause significant financial loss and/or disruption to NHS Connecting for Health, or the NHS; or

    —  result in any material loss or corruption of health data, or in the provision of incorrect data to an end user.

Severity 2:

  A Severity 2 service failure is a failure which, in the reasonable opinion of NHS Connecting for Health, the contractor, or a National Health Service system/service user has the potential to have a significant adverse impact on the provision of the service to a small or moderate number of service users; or

    —  have a moderate adverse impact on the delivery of patient care to a significant number of service users; or

    —  have a significant adverse impact on the delivery of patient care to a small or moderate number of patients; or

    —  have a moderate adverse impact on the delivery of patient care to a high number of patients; or

    —  cause a financial loss and/or disruption to NHS Connecting for Health, or the NHS which is more than trivial but less severe than the significant financial loss described in the definition of a Severity 1 service failure.

  The following tables show concurrent, registered users and service availability statistics for all National and Local Programmes for IT.

CONCURRENT USERS AND SERVICE AVAILABILITY STATISTICS—24x7
National systems N3 QMAS NHSmall Choose
and Book
Electronic Prescription Service SPINE (excluding EPS)
Potential uptime in user mins

Actual uptime in user mins

Lost user mins

No of users

Availability achieved for 1 year

Availability Target

Lost user minutes per user per year

Lost user minutes per user per month

470,807.2

470,799.8

7.4

975,000

99,998%

99.99%

7.6

0.6

2.220.0

2.220.0

0.0

4.119

100,000%

99.99%

0

0.0

57,799.0

57,798.2

0.8

118,223

99.999%

99.99%

8

0.6

2.168.0

2.166.3

1.7

6.337

99.924%

99.50%

263

21.9

2,840.0

3.839.7

0.3

7.866

99.990%

99.90%

106

8.8

64.339.0

64,336.5

2.5

149,453

99.997%

99.90%

8.2

6.9


Service Type PACS PAS PAS (excluding Maidstone) Theatres Theatres (excluding Maidstone) Ambulance GP
(Primary Care/Decision support)
Potential uptime in user mins

Actual uptime in user mins

Lost user mins

No of users

Availability achieved for 1 year

Availability Target

Lost user minutes per user per year

Lost user minutes per user per month

2,623.6

2,622.7

0.9

6,846

99.964%

99.87%

137

11.4

5.031.8

5.026.5

5.3

13,361

99.844%

99.90%

394

32.9

5.031.8

5.030.2

1.6

13.361

99,962%

99,90%

120

10.0

188.5

186.7

1.8

424

98.919%

95.00%

4.361

363.4

188.5

188.2

0.3

332

99.832%

95.00

739

62.0

221.3

221.3

0.00

496

100.000%

99.30%

0

0.0

4.011.1

4.011.0

0.04

9.064

99.999%

99.20%

4

0.4


CONCURRENT USER AND SERVICE AVAILABILITY STATISTICS-—SERVICE HOURS
National systems N3 QMAS NHSmall Choose
and Book
Electronic Prescription Service SPINE (excluding EPS)
Potential uptime in user mins

Actual uptime in user mins

Lost user mins

No of users

Availability achieved for 1 year

Availability Target

Lost user minutes per user per year

Lost user minutes per user per month

196.170.0

196.167.0

3.0

520.000

99.997%

99.99%

5.2

0.4

925.0

925.0

0

4.119

100.000%

99.99%

0

0.0

24.083.0

24.082.4

0.6

118.223

99.997%

99.99%

5

0.4

903.0

901.8

1.2

6.337

99.864%

99.50%

197

16.4

1,600.0

1.599.9

0.1

6,866

99.982%

99.90%

227

18.9

26.808.0

26,806/1

1.9

149.453

99.994%

99.90%

13

1.1

Service Type PACS PAS PAS (excluding Maidstone) Theatres Theatres (excluding Maidstone) Ambulance GP

(Primary Care/Decision support)

Potential uptime in user mins

Actual uptime in user mins

Lost user mins

No of users

Availability achieved for 1 year

Availability Target

2,623.6

2,622.7

0.9

6.846

99.964%

99.87%

5,031.8

5,026.5

5.3

13,361

99.884%

99.90%

5,031.8

5,030.2

1.6

13,361

99.962%

99.90%

188.5

186.7

1.8

424

98,919%

95.00%

188.5

188.2

0.3

332

99.832%

95.00%

221.3

221.3

0.00

496

100.000%

99.30

4,011.1

4,011.0

0.04

9.064

99.999%

99.20%


REGISTERED USERS AND SERVICE AVAILABILITY STATISTICS—24x7
National systems N3 QMAS NHSmall Choose

and Book

Electronic Prescription Service SPINE (excluding EPS)
Potential uptime in user mins

Actual uptime in user mins

Lost user mins

No of users

Availability achieved for 1 year

Lost user minutes per user per year

Availability Target

Lost user minutes per user per year

Lost user minutes per user per month

627,743.0

627,733.1

9.9

1,300,000

99.998%

99.99%

14.0

0.9

18,469.4

18,469.4

0

34,323

100.000%

99.99%

0

0.0

123,335.5

123.333.7

1.8

253.994

99.9985%

99.99%

7

0.6

40.587.3

40.585.6

1.7

87,400

99.9957%

99.50%

20

1.7

25,548.1

25,547.1

1.0

52,440

99.9959%

99.90%

32

2.6

144,772.5

144.760.1

12.4

297.158

99.9920%

99.90%

6

0.5

Service Type PACS PAS PAS (excluding Maidstone) Theatres Theatres (excluding Maidstone) Ambulance GP

(Primary Care/Decision support)

Potential uptime in user mins

Actual uptime in user mins

Lost user mins

No of users

Availability achieved for 1 year

Availability Target

Lost user minutes per user per year

Lost user minutes per user per month

3,272.2

3,271.2

1.0

6,163

99.968%

99.87%

166

13.9

29,798.5

29,788.8

9.7

69,552

99.965%

99.90%

139

11.6

29,798.5

29,788.8

9.7

69,552

99.965%

99.90%

139

12.0

1,916.7

1,907.2

9.5

4.198

99.455%

95.00%

2,260

188.4

1,917.0

1,916.6

0.4

3,081

99.975%

95.00%

105

9.0

1,641.4

1,641.4

0.0

3,742

100,000%

99.30%

0

0.0

3,877,409.2

3,877,381.4

27.8

8,982,307

99.999%

99.20%

3

0.3


REGISTERED USERS AND SERVICE AVAILABILITY STATISTICS—SERVICE HOURS
National systems N3 QMAS NHSmall Choose

and Book

Electronic Prescription Service SPINE (excluding EPS)
Potential uptime in user mins

Actual uptime in user mins

Lost user mins

No of users

Availability achieved for 1 year

Availability Target

Lost user minutes per user per year

Lost user minutes per user per month

261,559.6

261,552.2

7.4

1,300,000

99.997%

99.99%

8.0

0.7

51,389.8

51,389.8

0.0

34,323

100,0000%

99.99%

0

0.0

51,389.8

51,388.5

1.3

253,994

99,9974%

99.99%

2

0.1

16,911.0

16,909.7

1.3

87,400

99.9923%

99.50%

15

1.2

10,645.0

10,644.3

0.7

52,440

99,9821%

99.90%

24

1.6

60,322.0

60,312.7

9.3

297,158

99,9856%

99.90%

31

2.6

Service Type PACS PAS PAS (excluding Maidstone) Theatres Theatres (excluding Maidstone) Ambulance GP

(Primary Care/Decision support)

Potential uptime in user mins

Actual uptime in user mins

Lost user mins

No of users

Availability achieved for 1 year

Availability Target

Lost user minutes per user per year

Lost user minutes per user per month

1,363.4

1,362.6

0.8

6,163

99,943%

99.87%

166

13.9

12,416.1

12,408.8

7.3

69,552

99.937%

99.90%

139

11.6

12,416.1

12,408.8

7.3

69.552

99.936%

99.90%

105

9.0

698.6

791.5

7.1

4.198

99,019%

95.00%

2,260

188.4

799.0

798.7

0.3

3,081

99.955%

95.00%

79

7.0

683.9

683.9

0.0

3,742

100.000%

99.30%

0

0.0

1,615,587.2

1,615,566.4

20.8

8,982,307

99.999%

99.20%

3

0.3

2.   System Resilience


  Q636 Chairman: Do you have a comparator in terms of databases in the UK? I know there are different levels of resilience that evolve but what is the comparator with the one you are implementing for the national patient record?

  Mr Granger: We asked CIOs and frontline clinicians in the NHS during the specification process what levels of resilience did they want and they had some degree of tolerance for planning downtime, and I can let you have a note on the details of this, and a low degree of tolerance for unplanned downtime.

  Suppliers provide services from data centres, where IT systems are built to withstand significant levels of failure, and maintain service to the end user.

  Suppliers have built primary and secondary facilities at different sites to provide a back up in the event of a highly unlikely failure affecting a whole site. Within these data centres there are multiple levels of resilience, to withstand more localised failures. In other words, the data centre suppliers ensure they do not have any "single points of Failure", where one piece of IT equipment will exist without a back up, or a resilient partner. Often the additional resilience is also provided to improve performance by increasing the capability of each piece of IT equipment, and hence the overall system or service. The data centres are monitored 24x7 to ensure failures are identified and fixed prior to them having an impact on end user service. Data is stored securely over multiple sites, to ensure in the event of failure that no data is lost.

  Additional information is also provided on:

    —  The CSC quad data centre strategy in response to the service outage in 2006.

    —  National Application Service Provider (NASP) and Local Service Provider (LSP) data centre architecture and testing.

    —  Details of network switch and circuit resilience.

CSC QUAD DATA CENTRE STRATEGY

  NHS Connecting for Health commissioned an independent review of the service outages in 2006 which helped to identify areas where the service provision could be further improved. Key to business continuity in these areas is the ability to failover one system to another data centre independently of any other service that is being hosted and with which it may interact.

  With CSC taking over services that were being provided by Accenture in the North and East, CSC are building two new data centres to replace those that were being used. These new data centres will be operational this year and will embody the principles of independent failover that were highlighted in the review. CSC is undertaking a reworking of the architecture of the transitioned services to ensure that they will meet the high standards.

  The new data centres have been constructed within 50 kilometres of the existing CSC/NHS sites, but at a sufficient distance to ensure that no large scale incident could impact more than once. This proximity allows the four data centres to be used eventually to support four way failover, with three sites available for Disaster Recovery. The locations also allow for a "metropolitan" high speed network to be implemented that will allow the failover of N3 connectivity and data storage services, providing further levels of resilience. The high level architecture diagram, Figure 1, shows the logical relationship between all four data centres. The infrastructure element relationships supporting continuity of service are illustrated by the bi-directional arrows.

Note: Please refer to the PDF and use zoom for an improved rendition of the chart.

NASP AND LSP DATA CENTRE ARCHITECTURE AND TESTING

  The BT Spine architecture typifies the approach across NASP and LSP suppliers. The Spine service is provided from two data centres known as Live A, and Live B. They are secure and resilient, being located and built in such a manner to minimise any potential disruption to service. They are classed as List X sites. A List X site is a commercial (non-government) site on UK soil, that is approved to hold UK Government protectively marked information (Confidential and above). The approval is in the form of formal accreditation by the Communications Electronic Security Group (CESG), the Information Assurance arm of Government Communications Headquarters (GCHQ). Because companies with this status are those normally involved with Defence research and manufacturing that is vital to national security, the details of how resilient List X data centres are is restricted information. However, the sites are formally and regularly audited both at Government and Customer level and offer service levels far in advance of non-List X sites.

  The target to resolve a severity one incident is less than 2 hours. The severity one fix time target is linked to the target time to recover the service. Whereby if BT Spine were to experience a serious failure at one of the sites, which could mean service was going to be disrupted for an extended period if no action was taken, BT Spine would complete a failover to the unaffected site. This capability is regularly tested. In reality, service is resumed much more quickly than the target of 2 hours.

  BT Spine meets the requirements laid out to them by NHS CFH and has completed regular successful tests. This major disaster recovery failover testing is completed by suppliers at a minimum of every 12 months, with some tests scheduled every 6 months. Between these times, suppliers also complete other tests, such as process walkthrough, configuration audits and resilience tests to ensure they are prepared and ready in the event of a live operational requirement to complete a failover.

  In terms of the resilience within a data centre site, there is a significant level of testing prior to deployment to ensure the IT equipment performs as it was designed. Once implemented, the IT equipment is monitored 24x7 to identify any potential failures or issues, which, if not resolved, would cause failure.

  In addition, the resilience is monitored to identify when it is invoked automatically, ie if a database fails and a resilient partner maintains live service, this will be tracked and the outcomes recorded as a means of testing the resilience on an on-going basis.

DETAILS OF NETWORK SWITCH AND CIRCUIT RESILIENCE

  Resilience is provided in the network by the deployment of primary and secondary circuits and switches to maintain continuity of service. The level of resilience within the N3 network is based upon a combination of N3 specific elements and components of Disaster Recovery Service provided by the suppliers to N3 Service Provider including BT. The Network, which has been deployed, is based on Points of Presence (PoPs) and these PoPs are designed to facilitate the contractual requirement to be able to connect resiliently all access catalogue services into the N3 Core. The PoPs are designed to support connections from primary and secondary circuits from N3 Customer sites. In addition to N3 access circuits being resiliently connected into the N3 core, the core itself and all key infrastructure components that operate upon the network core (eg Internet Gateway, Domain Name Sever (DNS) and infrastructure for other N3 Foundation Services) have been built to a specification that are resilient in design. Taking this into consideration, business recovery strategies are in place for all standard elements of the network and strict SLAs are in place to ensure that N3SP restores service and original configuration of those services within the shortest possible time, should services be interrupted. Business Recovery Plans are also in place for other supporting service elements delivered by N3.

Richard Granger

Department of Health

5 July 2007





 
previous page contents next page

House of Commons home page Parliament home page House of Lords home page search page enquiries index

© Parliamentary copyright 2007
Prepared 13 September 2007