Skip to content

Malformed errors returned when key-value store is locked #232

@PatrickLang

Description

@PatrickLang

Is this a request for help?: No


Is this an ISSUE or FEATURE REQUEST? (choose one): Issue


Which release version?: v1.0.11


Which component (CNI/IPAM/CNM/CNS): CNI


Which Operating System (Linux/Windows): Windows Server version 1803

Which Orchestrator and version (e.g. Kubernetes, Docker): Kubernetes v1.10.6


What happened:

When I try to scale up to several pods on the same node, there will be some store locking issues as overlapped calls are made to the Azure CNI binary.

2018/08/15 21:11:45 Going to send Telemetry report to hostnetagent http://169.254.169.254/machine/plugins?comp=netagent&type=cnireport
2018/08/15 21:11:45 "Start Flag false CniSucceeded false Name CNI Version v1.0.11 ErrorMessage Store is locked vnet [] 
				Context AzureCNI SubContext "
2018/08/15 21:11:45 OrchestratorDetails &{  kubectl command failed due to exit status 1}
2018/08/15 21:11:45 OSDetails &{windows    }
2018/08/15 21:11:45 SystemDetails &{0 0 0 0 0 0 }
2018/08/15 21:11:45 InterfaceDetails &{Primary 10.240.0.0/12 10.240.0.4 00:0d:3a:f9:3a:fe vEthernet (Ethernet 2) 30 0 }
2018/08/15 21:11:45 BridgeDetails <nil>
2018/08/15 21:11:45 Send telemetry success 200
2018/08/15 21:11:45 Going to send Telemetry report to hostnetagent http://169.254.169.254/machine/plugins?comp=netagent&type=cnireport
2018/08/15 21:11:45 "Start Flag false CniSucceeded false Name CNI Version v1.0.11 ErrorMessage Store is locked vnet [] 
				Context AzureCNI SubContext "
2018/08/15 21:11:45 OrchestratorDetails &{  kubectl command failed due to exit status 1}
2018/08/15 21:11:45 OSDetails &{windows    }
2018/08/15 21:11:45 SystemDetails &{0 0 0 0 0 0 }
2018/08/15 21:11:45 InterfaceDetails &{Primary 10.240.0.0/12 10.240.0.4 00:0d:3a:f9:3a:fe vEthernet (Ethernet 2) 30 0 }
2018/08/15 21:11:45 BridgeDetails <nil>
2018/08/15 21:11:45 Send telemetry success 200
2018/08/15 21:11:45 Going to send Telemetry report to hostnetagent http://169.254.169.254/machine/plugins?comp=netagent&type=cnireport
2018/08/15 21:11:45 "Start Flag false CniSucceeded false Name CNI Version v1.0.11 ErrorMessage Store is locked vnet [] 
				Context AzureCNI SubContext "
2018/08/15 21:11:45 OrchestratorDetails &{  kubectl command failed due to exit status 1}
2018/08/15 21:11:45 OSDetails &{windows    }
2018/08/15 21:11:45 SystemDetails &{0 0 0 0 0 0 }
2018/08/15 21:11:45 InterfaceDetails &{Primary 10.240.0.0/12 10.240.0.4 00:0d:3a:f9:3a:fe vEthernet (Ethernet 2) 30 0 }
2018/08/15 21:11:45 BridgeDetails <nil>
2018/08/15 21:11:45 Send telemetry success 200
2018/08/15 21:11:45 Going to send Telemetry report to hostnetagent http://169.254.169.254/machine/plugins?comp=netagent&type=cnireport
2018/08/15 21:11:45 "Start Flag false CniSucceeded true Name CNI Version v1.0.11 ErrorMessage  vnet [] 
				Context AzureCNI SubContext "
2018/08/15 21:11:45 OrchestratorDetails &{  kubectl command failed due to exit status 1}
2018/08/15 21:11:45 OSDetails &{windows    }
2018/08/15 21:11:45 SystemDetails &{0 0 0 0 0 0 }
2018/08/15 21:11:45 InterfaceDetails &{Primary 10.240.0.0/12 10.240.0.4 00:0d:3a:f9:3a:fe vEthernet (Ethernet 2) 30 0 }
2018/08/15 21:11:45 BridgeDetails <nil>
2018/08/15 21:11:45 Send telemetry success 200
2018/08/15 21:11:45 SetReportState succeeded
2018/08/15 21:11:45 SetReportState succeeded
2018/08/15 21:11:45 SetReportState succeeded
2018/08/15 21:11:45 SetReportState succeeded
E0815 21:11:45.950994    4188 cni.go:259] Error adding network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0815 21:11:45.950994    4188 cni_windows.go:49] error while adding to cni network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
W0815 21:11:45.950994    4188 docker_sandbox.go:353] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "iis-1803-5ffd8b84d6-46msm_default": netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
I0815 21:11:45.957988    4188 kubelet_node_status.go:491] Using Node Hostname from cloudprovider: "15453k8s9001"
E0815 21:11:45.971011    4188 cni.go:259] Error adding network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0815 21:11:45.971011    4188 cni_windows.go:49] error while adding to cni network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
W0815 21:11:45.971011    4188 docker_sandbox.go:353] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "iis-1803-5ffd8b84d6-vdppz_default": netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0815 21:11:45.991997    4188 cni.go:259] Error adding network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0815 21:11:45.992999    4188 cni_windows.go:49] error while adding to cni network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
W0815 21:11:45.992999    4188 docker_sandbox.go:353] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "iis-1803-5ffd8b84d6-sjlmm_default": netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0815 21:11:46.006993    4188 cni.go:259] Error adding network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0815 21:11:46.006993    4188 cni_windows.go:49] error while adding to cni network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
W0815 21:11:46.006993    4188 docker_sandbox.go:353] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "iis-1803-5ffd8b84d6-2rz9c_default": netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
2018/08/15 21:11:46 SetReportState succeeded
2018/08/15 21:11:41 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error
2018/08/15 21:11:41 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error
2018/08/15 21:11:41 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error
2018/08/15 21:11:41 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error
2018/08/15 21:11:41 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error
2018/08/15 21:11:46 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error
2018/08/15 21:11:46 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error

What you expected to happen:

Instead of seeing E0815 21:11:45.950994 4188 cni.go:259] Error adding network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input after CNI failed to get the lock, I would expect a valid error.

I looked at the source:

log.Printf("Failed to initialize key-value store of network plugin, err:%v.\n", err)

and it has return code 1 with nothing on stderr

What I would expect is a return code >100 (vendor specific), along with this json schema on stdout:

{
  "cniVersion": "0.3.1",
  "code": <numeric-error-code>,
  "msg": <short-error-message>,
  "details": <long-error-message> (optional)
}

How to reproduce it (as minimally and precisely as possible):

Try to scale anything up on Windows. It will eventually succeed, but there will be errors in the process as multiple pods need addEndpoint called simultaneously.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions