Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol #184

Closed
wants to merge 13 commits into from

Conversation

afine
Copy link
Contributor

@afine afine commented Mar 6, 2017

This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming.

This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation.

Rolling upgrades are supported via port unification (portUnification=true). This should only be used while performing a rolling upgrade.

Some open questions:

Thanks,
Abe

@afine afine force-pushed the ZOOKEEPER-236 branch 2 times, most recently from 937c39a to 54f60f2 Compare March 10, 2017 22:23
@afine afine force-pushed the ZOOKEEPER-236 branch 3 times, most recently from a192293 to bebe096 Compare March 17, 2017 22:19
((X509ExtendedTrustManager) tm).checkClientTrusted(x509Certificates, s, socket);
public void checkClientTrusted(X509Certificate[] chain, String authType, Socket socket) throws CertificateException {
if (hostnameVerificationEnabled && shouldVerifyClientHostname) {
hostnameVerifier.verify(socket.getInetAddress().getHostName(), ((SSLSocket) socket).getSession());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the hostname be available on the server side to verify?. How will the server know who this client is, one way is to implement a special HostnameVerifier that can be provide a set of peer hostnames or refer back to QuorumPeer to verify using the QuorumVerifier?.

Copy link
Contributor Author

@afine afine Mar 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the reverse dns lookup is done on the performHostnameVerification... line

implement a special HostnameVerifier that can be provide a set of peer hostnames or refer back to QuorumPeer to verify using the QuorumVerifier

I'm a little confused by this. Do you mean that the users needs to provide the mapping from IP addresses to hostnames in advance?

return new X509ExtendedTrustManager() {
HostnameChecker hostnameChecker = HostnameChecker.getInstance(HostnameChecker.TYPE_TLS);
X509ExtendedTrustManager x509ExtendedTrustManager = (X509ExtendedTrustManager) tm;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think extending X509ExtendedTrustManager outside will help us test it better?.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure yet. The createTrustManager method might give us all the exposure to this implementation that we need. I think this will become more obvious as we begin expanding the tests.

@@ -0,0 +1,231 @@
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to be a test file, right? If so, test file should be under src/test folder. Does this even build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops... Fixed.

Security.setProperty("com.sun.security.enableCRLDP", "false");
}

@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case if any of these test method run for a long time (which I doubt) then it'll be good to specify a timeout value annotation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

The QuorumSSLTest needs them too.

@@ -193,21 +198,18 @@ void request(Request request) throws IOException {
/**
* Returns the address of the node we think is the leader.
*/
protected InetSocketAddress findLeader() {
protected QuorumServer findLeader() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why findLeader is returning a QuorumServer instead of InetSocketAddress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Egah, this is a change from an earlier iteration that I forgot to remove. Fixed now.

@@ -253,7 +255,10 @@ protected void connectToLeader(InetSocketAddress addr)
throw new IOException("initLimit exceeded on retries.");
}

sockConnect(sock, addr, Math.min(self.tickTime * self.syncLimit, remainingInitLimitTime));
sockConnect(sock, leader.addr, Math.min(self.tickTime * self.syncLimit, remainingInitLimitTime));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leader could be null in leader.addr right? Still wondering why change the leader type from an address to a QuorumServer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed.

sock.setSoTimeout(self.tickTime * self.initLimit);
protected void connectToLeader(QuorumServer leader)
throws IOException, InterruptedException, X509Exception {
QuorumX509Util quorumX509Util = new QuorumX509Util();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now create QuorumX509Util whenever connectToLeader is called, even if ssl is not enabled. Could this be improved so the SSL related code path is only involved when ssl is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Fixed

ivy.xml Outdated
@@ -63,6 +67,9 @@
<dependency org="commons-collections" name="commons-collections"
rev="3.2.2" conf="test->default"/>

<dependency org="org.bouncycastle" name="bcprov-jdk15on" rev="1.56" conf="test->default"/>
<dependency org="org.bouncycastle" name="bcpkix-jdk15on" rev="1.56" conf="test->default"/>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this specified twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One is bcprov-jdk15on and the other is bcpkix-jdk15on.

this.self = self;
try {
if (self.getQuorumListenOnAllIPs()) {
ss = new ServerSocket(self.getQuorumAddress().getPort());
if (self.shouldUsePortUnification()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move all this code away from here to a different class like SSLSocketFactory which can help keep Leader code have similar number of branches if not more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tough call IMO. I think you are correct that moving this out of the Leader would make the leader code simpler but on the other hand this is logic that I think "belongs" to the leader and is not really relevant anywhere else.

What do you think about the compromise of moving this to a static method in Leader where it can be tested and prevents cluttering up the constructor logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some level of refactoring that makes the constructor less clogged would be valuable here..

@Override
public void checkClientTrusted(X509Certificate[] chain, String authType, Socket socket) throws CertificateException {
if (hostnameVerificationEnabled && shouldVerifyClientHostname) {
performHostnameVerification(socket.getInetAddress().getHostName(), chain[0]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Quorum server connection I am not entirely sure if getInetAddres().getHostName() is a good idea when on server side this will force a reverse DNS lookup. When customer has only provided a ip address as config perhaps using hostname is not correct. And if customer has provided a hostname performing reverse dns lookup is not necessary and can be argued as not safe. Since the trust anchor is the user provided config and not the DNS service. Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should not do reverse DNS lookups when we do not need to.

I changed the host verification logic to only perform a reverse dns lookup when host verification fails using the ip address. I think it is still fair to try the reverse dns lookup at that point as the value given in the altname of the certificate is not necessarily tied to the zookeeper configuration.

Let me know if you think this is reasonable.

import static org.apache.zookeeper.test.ClientBase.CONNECTION_TIMEOUT;
import static org.apache.zookeeper.test.ClientBase.createTmpDir;

public class QuorumSSLTest extends QuorumPeerTestBase {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice great stuff :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

public static final String DEFAULT_PROTOCOL = "TLSv1";

private String sslProtocolProperty = getConfigPrefix() + "protocol";
private String cipherSuitesProperty = getConfigPrefix() + "ciphersuites";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -21,6 +21,7 @@
import org.apache.zookeeper.ClientCnxn.EndOfStreamException;
import org.apache.zookeeper.ClientCnxn.Packet;
import org.apache.zookeeper.client.ZKClientConfig;
import org.apache.zookeeper.common.ClientX509Util;
import org.apache.zookeeper.common.X509Util;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -31,6 +31,7 @@
import javax.net.ssl.SSLSocket;
import javax.net.ssl.SSLSocketFactory;

import org.apache.zookeeper.common.ClientX509Util;
import org.apache.zookeeper.common.X509Exception.SSLContextException;
import org.apache.zookeeper.common.X509Util;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

import static org.apache.zookeeper.common.X509Exception.KeyManagerException;
import static org.apache.zookeeper.common.X509Exception.SSLContextException;
import static org.apache.zookeeper.common.X509Exception.TrustManagerException;
import static org.apache.zookeeper.common.X509Exception.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use explicit imports instead of wildcards imports - I guess here IDE tried to be clever..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -150,7 +196,7 @@ public static X509KeyManager createKeyManager(String keyStoreLocation, String ke
KeyStore ks = KeyStore.getInstance("JKS");
inputStream = new FileInputStream(keyStoreFile);
ks.load(inputStream, keyStorePasswordChars);
KeyManagerFactory kmf = KeyManagerFactory.getInstance("SunX509");
KeyManagerFactory kmf = KeyManagerFactory.getInstance("PKIX");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any comments on what this change is for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment has been added where pkix is used for the trustmanager factory. We don't actually need this change for the keymanager, but i preferred to be consistent.


private void performHostVerification(InetAddress inetAddress, X509Certificate certificate) throws CertificateException {
try {
hostnameVerifier.verify(inetAddress.getHostAddress(), certificate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this only verifies that the host name matches what's stored in the certificates, right? If so, how do we verify that same subject is a valid peer we intended to connect to, rather than any random server which happens to have a valid cert? Or is this already taken care of by other parts of code?

For reference, in ZOOKEEPER-1045 what we did for similar verification is by comparing the name of the server with the name retrieved from Kerberos principal. The name of the server is retrieved from the zoo.cfg and because of this we require if users want this verification they need encode the server as FQDN in zoo.cfg rather than using IP address. Maybe we can do something similar here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have a few hostname checks that we could consider. Please let me know if I am missing any.

  1. Check that the address/hostname of the machine matches what is listed on the certificate
  2. Check that the address/hostname of the machine matches what is expected from zoo.cfg (this may also require reading the sid from the packet sent when a connection is established).
  3. Check that the address/hostname listed on the certificate matches what is expected from zoo.cfg

This patch only performs check 1. I think that check 3 is something @geek101 mentioned earlier and we agreed this can be moved to another patch as it is more complicated.

I think it is also possible to implement check 2. This can exist entirely outside of tls/ssl support and is something we should consider based on its own merits.

I think having checks 1 and 2 would be plenty secure and we would not need check 3.

Let me know what you think @hanm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check 1 and check 2 together do what check 3 does, no? So I think 1+2 is enough. The point is to make sure that:

  • The peer is what it claims to be (check 1 - authentication).
  • The peer is authorized to join quorum (check 2 - authorization).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It could also be argued that authorization is handled by this patch in its current state as well, by only accepting peers that have trusted certificates. In other words, the CA limits which machines get certificates, which limits which machines get to join the quorum. So check 2 is not "needed" if the CA is trusted.

I also think that check 2 has value for deployments that want to reuse the same truststore for multiple services and for deployments that are not using SSL/TLS at all. So I think it is a check worth doing and it is deserving of its own patch. Would be interested in knowing what other members of the community think.

@@ -71,7 +71,7 @@ void followLeader() throws InterruptedException {
self.end_fle = 0;
fzk.registerJMX(new FollowerBean(this, zk), self.jmxLocalPeerBean);
try {
InetSocketAddress addr = findLeader();
InetSocketAddress addr = findLeader();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary change here - this file can be removed from the change list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -51,6 +53,8 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.net.ssl.SSLSocket;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these imports are not used. Another look is this file can also be removed from change list because nothing changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -383,6 +387,7 @@ public void run() {
+ leader.self.getView().get(this.sid).toString());
} else {
LOG.info("Follower sid: " + this.sid + " not in the current config " + Long.toHexString(leader.self.getQuorumVerifier().getVersion()));
//TODO: Hostname verification possible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to remove this TODO as it appertains to no code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}

@Test(timeout = 300000)
public void testRollingUpgrade() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is pretty flaky - please check internal Jenkins to diagnose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@afine afine changed the title ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol [DO NOT MERGE] ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol Apr 14, 2017
@afine afine force-pushed the ZOOKEEPER-236 branch 3 times, most recently from d1a150d to 08d2981 Compare April 20, 2017 23:16
@@ -63,6 +66,10 @@ public void shutdown() {
}

public static class MainThread implements Runnable {
public File getConfFile() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't find a call to this getConfFile. Is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enixon
Copy link

enixon commented Sep 17, 2018

+1 for @anmolnar 's plan

Afaik, with portUnification=false there shouldn't be any problems.

@ivmaykov does this work for you?

@ivmaykov
Copy link
Contributor

ivmaykov commented Sep 18, 2018

I think as long as we keep portUnification=false there should not be hangs/crashes. However, that means it's not possible to safely upgrade a cluster from plaintext quorum to TLS quorum without downtime. @hanm mentioned "mitigations" but there really isn't a way to mitigate the issues w/ UnifiedServerSocket in #184 (other than "don't use it"). One option is to keep the unified server socket code, but don't parse the portUnification option in QuorumPeerConfig so there is no way to use the feature. Or we could document the issues and have a clear warning ("portUnification has known problems and may cause your ensemble to enter a bad state that requires reboots, use at your own risk"), and let people take the risk if they like.

One issue that we found is the >10% perf regression in plaintext mode when the apache httpclient library dependency is added. We never figured out why it caused the perf regression, but it could be a potential blocker. Unlike the port unification bugs, this perf regression cannot be worked around - it was present even when all the SSL features were disabled. The fix for that is isolated to one file and basically involves copy/pasting the hostname verifier code from httpclient directly into ZK, so it would be pretty easy to backport to #184 if desired.

Let me just do another pass over the differences between the two PRs and see if anything else jumps out at me.

@anmolnar
Copy link
Contributor

@ivmaykov

portUnification
Let's just disable the feature completely in this patch: don't parse the config option to be on the safe side. Rolling upgrade will be supported once we consider it stable.

perf regression
What exactly the code snippet which needs to be copied over and where? Sounds like an easy fix to this patch.

@hanm Please start reviewing the code if you have some free cycles.

@anmolnar
Copy link
Contributor

@eolivelli Looks like I triggered the build with my latest comment. Maybe my comment to @hanm :)

@asfgit
Copy link

asfgit commented Sep 25, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2259/

@hanm
Copy link
Contributor

hanm commented Sep 26, 2018

@anmolnar sure, will sign this off before next monday.

Copy link
Contributor

@hanm hanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had one pass on code, just a few nits on styling issues, and one question about the conditions on host name verification. will have another pass on test code tomorrow.

setSockOpts(sock);
sock.connect(electionAddr, cnxTO);
LOG.debug("Connected to server " + sid);
LOG.debug("Opening channel to server " + sid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentations on this line and the following lines

sock = new Socket();
setSockOpts(sock);
sock.connect(electionAddr, cnxTO);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove extra line here.

} else { initiateConnection(sock, sid);
} return true;
} catch (UnresolvedAddressException e) {
// Sun doesn't include the address that causes this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentations

LOG.warn("Cannot open channel to " + sid
+ " at election address " + electionAddr, e);
closeSocket(sock);
throw e;} catch (X509Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: start a new line for } catch (X509Exception e) {

* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
* <p/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be <p> (we already have a closing tag <p/>)?

try {
performHostVerification(InetAddress.getByName(engine.getPeerHost()), chain[0]);
} catch (UnknownHostException e) {
throw new CertificateException("failed to verify host", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Failed

hostnameVerifier.verify(hostAddress, certificate);
} catch (SSLException addressVerificationException) {
try {
LOG.debug("Failed to verify host address: " + hostAddress +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use parameterized logging.

hostnameVerifier.verify(hostName, certificate);
} catch (SSLException hostnameVerificationException) {
LOG.error("Failed to verify host address: " + hostAddress, addressVerificationException);
LOG.error("Failed to verify hostname: " + hostName, hostnameVerificationException);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use parameterized logging.

} catch (SSLException hostnameVerificationException) {
LOG.error("Failed to verify host address: " + hostAddress, addressVerificationException);
LOG.error("Failed to verify hostname: " + hostName, hostnameVerificationException);
throw new CertificateException("Failed to verify both host address and host name", hostnameVerificationException);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be useful to log inetAddress as well in cases where both hostName and hostAddress ends up empty because of exceptions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both hostAddress and hostName are logged (the 2 entities which was probed). What else should be logged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the inetAddress should be logged to provide additional information when both hostAddress and hostName fails to resolve.

} else if (key.equals("sslQuorum")){
sslQuorum = Boolean.parseBoolean(value);
} else if (key.equals("portUnification")){
shouldUsePortUnification = Boolean.parseBoolean(value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need to disable the parsing code here for this patch, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @hanm - comment out these two lines, and add a comment:

// TODO: UnifiedServerSocket is currently buggy, will be fixed when @ivmaykov's PRs are merged. Disable port unification until then.

@ivmaykov
Copy link
Contributor

@anmolnar you just need to copy ZKTrustManager.java from #627 to #184, and remove the dependency on httpclient from build.xml and ivy.xml. And comment out the 2 lines of code in QuorumPeerConfig.java that parse the "portUnification" config property and set the instance variable shouldUsePortUnification.

build.xml Outdated
@@ -38,11 +38,15 @@ xmlns:cs="antlib:com.puppycrawl.tools.checkstyle.ant">

<property name="netty.version" value="3.10.6.Final"/>

<property name="httpcomponents.version" value="4.5.3"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's remove this dependency.

ivy.xml Outdated
@@ -63,6 +63,10 @@
<artifact name="netty" type="jar" conf="default"/>
</dependency>

<dependency org="org.apache.httpcomponents" name="httpclient" rev="${httpcomponents.version}">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

* We attempt to perform verification using just the IP address first and if that fails will attempt to perform a
* reverse DNS lookup and verify using the hostname.
*/
public class ZKTrustManager extends X509ExtendedTrustManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace the entire contents of this file with the same file from #627. It's the same code, but the org.apache.http.conn.ssl.DefaultHostnameVerifier class is copied into the file as a private inner class and renamed to ZKHostnameVerifier. Two other apache utility classes from httpclient (SubjectName and InetAddressUtils) are also copied in.

This resolved a >10% perf regression in our internal testing at FB. We don't actually know why including httpclient caused such a large perf regression (and strangely, the regression was present even with SSL configs disabled), but it's a pretty easy fix ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit strange to me (copy-pasting code from other project to remove dependency) and needs additional review, but I don't want to block this PR really.

Have you checked the latest version of HttpClient 4.5.6? I've seen a number of bugfixes included in the maintenance releases. Might help.

@hanm what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't checked against 4.5.6, but given that the code path wasn't even being taken I doubt a later version would help. I don't know enough about JVM performance quirks to even have a theory for how including an unused jar leads to a perf regression. Maybe something to do with the class loader?

Anyway, including a fully featured HTTP client library in Zookeeper's server code base seems weird in any case, since ZK doesn't actually use the HTTP client functionality for anything. If you don't want to copy-paste code, do you know of some smaller open source library that just provides the hostname verification that we could use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a dependency on Hadoop, so we would need to copy-and-paste this class too. Maybe your original approach is the quickest way to get this done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am +1 on @ivmaykov 's approach.

We should create a JIRA and put a comment in code links with JIRA so this can be cleaned up later, though this sounds a mysterious issue by the description so far (that performance degrades with new code path not hit?).

} else if (key.equals("sslQuorum")){
sslQuorum = Boolean.parseBoolean(value);
} else if (key.equals("portUnification")){
shouldUsePortUnification = Boolean.parseBoolean(value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @hanm - comment out these two lines, and add a comment:

// TODO: UnifiedServerSocket is currently buggy, will be fixed when @ivmaykov's PRs are merged. Disable port unification until then.

@asfgit
Copy link

asfgit commented Oct 1, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2296/

@asfgit
Copy link

asfgit commented Oct 1, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2297/

@anmolnar
Copy link
Contributor

anmolnar commented Oct 3, 2018

@ivmaykov @hanm I'm done with the changes.
Let's see if we can get a green build.

@asfgit
Copy link

asfgit commented Oct 3, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2335/

@ivmaykov
Copy link
Contributor

ivmaykov commented Oct 4, 2018

I'm not a committer so my +1 doesn't count for much, but this PR looks good to me. I think we should get it in ASAP and then try to get my changes in (as several separate PRs, I will need to break them up) hopefully before 3.5 ships so people can have PEM support, cert reloading, and port unification.

@hanm
Copy link
Contributor

hanm commented Oct 4, 2018

I'm not a committer so my +1 doesn't count for much

It counts a lot, thanks for spending time not just review the patch but also test it in production!

Copy link
Contributor

@hanm hanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with latest changes.

@anmolnar are there any remaining works you plan to do on this PR? I think we are good to land but I'd like to double check before I commit this. thanks

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 no more plans, good to go.
I'll commit just to make things faster. (not my PR anyway :)

asfgit pushed a commit that referenced this pull request Oct 5, 2018
This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming.

This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation.

Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade.

Some open questions:
- Anyone have any ideas for better names for the configuration options (`sslQuorum` and `portUnification` currently).
- I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate?
- Is port unification the correct approach for rolling upgrades? Is the impact from the use of `BufferedSocket`s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocket
- server <-> client ssl is implemented with netty. I did not feel that rewriting our server <-> server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary?

Thanks,
Abe

Author: Andor Molnar <andor@cloudera.com>
Author: Andor Molnar <andor@apache.org>

Reviewers: hanm@apache.org, andor@apache.org

Closes #184 from afine/ZOOKEEPER-236 and squashes the following commits:

fdcc915 [Andor Molnar] ZOOKEEPER-236. Replaced DefaultHostnameVerifier with custom impl
c014a54 [Andor Molnar] ZOOKEEPER-236. Temporary disabled portUnification support
e414496 [Andor Molnar] ZOOKEEPER-236. Nit code review fixes
209fbca [Andor Molnar] ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings
1f8aab0 [Andor Molnar] ZOOKEEPER-236. Revert portUnification/sslQuorum logic
a9fa698 [Andor Molnar] ZOOKEEPER-236. Code review fixes:
777f31a [Andor Molnar] ZOOKEEPER-236. Added Java8/Java9 default cipher suites
e8a1729 [Andor Molnar] ZOOKEEPER-236. Reverted to use single property for hostname verification
d64eb26 [Andor Molnar] ZOOKEEPER-236. Code review related changes: - server & client hostname verification can be set independently, - refactor defaultSSLContext to use AtomicReference, - some minor nitpicks
9ab476a [Andor Molnar] ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2 and filter suitable cipher suites
ed10e88 [Andor Molnar] ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value. Null checks
c452d1b [Andor Molnar] ZOOKEEPER-236. Fixed unit test + added some extra debug logging
88b6171 [Andor Molnar] ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
anmolnar pushed a commit to anmolnar/zookeeper that referenced this pull request Oct 5, 2018
This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming.

This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation.

Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade.

Some open questions:
- Anyone have any ideas for better names for the configuration options (`sslQuorum` and `portUnification` currently).
- I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate?
- Is port unification the correct approach for rolling upgrades? Is the impact from the use of `BufferedSocket`s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocket
- server <-> client ssl is implemented with netty. I did not feel that rewriting our server <-> server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary?

Thanks,
Abe

Author: Andor Molnar <andor@cloudera.com>
Author: Andor Molnar <andor@apache.org>

Reviewers: hanm@apache.org, andor@apache.org

Closes apache#184 from afine/ZOOKEEPER-236 and squashes the following commits:

fdcc915 [Andor Molnar] ZOOKEEPER-236. Replaced DefaultHostnameVerifier with custom impl
c014a54 [Andor Molnar] ZOOKEEPER-236. Temporary disabled portUnification support
e414496 [Andor Molnar] ZOOKEEPER-236. Nit code review fixes
209fbca [Andor Molnar] ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings
1f8aab0 [Andor Molnar] ZOOKEEPER-236. Revert portUnification/sslQuorum logic
a9fa698 [Andor Molnar] ZOOKEEPER-236. Code review fixes:
777f31a [Andor Molnar] ZOOKEEPER-236. Added Java8/Java9 default cipher suites
e8a1729 [Andor Molnar] ZOOKEEPER-236. Reverted to use single property for hostname verification
d64eb26 [Andor Molnar] ZOOKEEPER-236. Code review related changes: - server & client hostname verification can be set independently, - refactor defaultSSLContext to use AtomicReference, - some minor nitpicks
9ab476a [Andor Molnar] ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2 and filter suitable cipher suites
ed10e88 [Andor Molnar] ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value. Null checks
c452d1b [Andor Molnar] ZOOKEEPER-236. Fixed unit test + added some extra debug logging
88b6171 [Andor Molnar] ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
@anmolnar
Copy link
Contributor

anmolnar commented Oct 5, 2018

@ivmaykov It was great help reviewing this huge PR. Thanks a lot!
Merged to 3.5 branch, master is ongoing. I need to rebase it first, because Maven migration was committed in the meantime.
@afine Please close this PR.

anmolnar pushed a commit to anmolnar/zookeeper that referenced this pull request Oct 5, 2018
This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming.

This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation.

Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade.

Some open questions:
- Anyone have any ideas for better names for the configuration options (`sslQuorum` and `portUnification` currently).
- I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate?
- Is port unification the correct approach for rolling upgrades? Is the impact from the use of `BufferedSocket`s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocket
- server <-> client ssl is implemented with netty. I did not feel that rewriting our server <-> server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary?

Thanks,
Abe

Author: Andor Molnar <andor@cloudera.com>
Author: Andor Molnar <andor@apache.org>

Reviewers: hanm@apache.org, andor@apache.org

Closes apache#184 from afine/ZOOKEEPER-236 and squashes the following commits:

fdcc915 [Andor Molnar] ZOOKEEPER-236. Replaced DefaultHostnameVerifier with custom impl
c014a54 [Andor Molnar] ZOOKEEPER-236. Temporary disabled portUnification support
e414496 [Andor Molnar] ZOOKEEPER-236. Nit code review fixes
209fbca [Andor Molnar] ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings
1f8aab0 [Andor Molnar] ZOOKEEPER-236. Revert portUnification/sslQuorum logic
a9fa698 [Andor Molnar] ZOOKEEPER-236. Code review fixes:
777f31a [Andor Molnar] ZOOKEEPER-236. Added Java8/Java9 default cipher suites
e8a1729 [Andor Molnar] ZOOKEEPER-236. Reverted to use single property for hostname verification
d64eb26 [Andor Molnar] ZOOKEEPER-236. Code review related changes: - server & client hostname verification can be set independently, - refactor defaultSSLContext to use AtomicReference, - some minor nitpicks
9ab476a [Andor Molnar] ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2 and filter suitable cipher suites
ed10e88 [Andor Molnar] ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value. Null checks
c452d1b [Andor Molnar] ZOOKEEPER-236. Fixed unit test + added some extra debug logging
88b6171 [Andor Molnar] ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
anmolnar pushed a commit to anmolnar/zookeeper that referenced this pull request Oct 5, 2018
This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming.

This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation.

Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade.

Some open questions:
- Anyone have any ideas for better names for the configuration options (`sslQuorum` and `portUnification` currently).
- I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate?
- Is port unification the correct approach for rolling upgrades? Is the impact from the use of `BufferedSocket`s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocket
- server <-> client ssl is implemented with netty. I did not feel that rewriting our server <-> server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary?

Thanks,
Abe

Author: Andor Molnar <andor@cloudera.com>
Author: Andor Molnar <andor@apache.org>

Reviewers: hanm@apache.org, andor@apache.org

Closes apache#184 from afine/ZOOKEEPER-236 and squashes the following commits:

fdcc915 [Andor Molnar] ZOOKEEPER-236. Replaced DefaultHostnameVerifier with custom impl
c014a54 [Andor Molnar] ZOOKEEPER-236. Temporary disabled portUnification support
e414496 [Andor Molnar] ZOOKEEPER-236. Nit code review fixes
209fbca [Andor Molnar] ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings
1f8aab0 [Andor Molnar] ZOOKEEPER-236. Revert portUnification/sslQuorum logic
a9fa698 [Andor Molnar] ZOOKEEPER-236. Code review fixes:
777f31a [Andor Molnar] ZOOKEEPER-236. Added Java8/Java9 default cipher suites
e8a1729 [Andor Molnar] ZOOKEEPER-236. Reverted to use single property for hostname verification
d64eb26 [Andor Molnar] ZOOKEEPER-236. Code review related changes: - server & client hostname verification can be set independently, - refactor defaultSSLContext to use AtomicReference, - some minor nitpicks
9ab476a [Andor Molnar] ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2 and filter suitable cipher suites
ed10e88 [Andor Molnar] ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value. Null checks
c452d1b [Andor Molnar] ZOOKEEPER-236. Fixed unit test + added some extra debug logging
88b6171 [Andor Molnar] ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
anmolnar pushed a commit to anmolnar/zookeeper that referenced this pull request Oct 5, 2018
This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming.

This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation.

Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade.

Some open questions:
- Anyone have any ideas for better names for the configuration options (`sslQuorum` and `portUnification` currently).
- I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate?
- Is port unification the correct approach for rolling upgrades? Is the impact from the use of `BufferedSocket`s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocket
- server <-> client ssl is implemented with netty. I did not feel that rewriting our server <-> server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary?

Thanks,
Abe

Author: Andor Molnar <andor@cloudera.com>
Author: Andor Molnar <andor@apache.org>

Reviewers: hanm@apache.org, andor@apache.org

Closes apache#184 from afine/ZOOKEEPER-236 and squashes the following commits:

fdcc915 [Andor Molnar] ZOOKEEPER-236. Replaced DefaultHostnameVerifier with custom impl
c014a54 [Andor Molnar] ZOOKEEPER-236. Temporary disabled portUnification support
e414496 [Andor Molnar] ZOOKEEPER-236. Nit code review fixes
209fbca [Andor Molnar] ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings
1f8aab0 [Andor Molnar] ZOOKEEPER-236. Revert portUnification/sslQuorum logic
a9fa698 [Andor Molnar] ZOOKEEPER-236. Code review fixes:
777f31a [Andor Molnar] ZOOKEEPER-236. Added Java8/Java9 default cipher suites
e8a1729 [Andor Molnar] ZOOKEEPER-236. Reverted to use single property for hostname verification
d64eb26 [Andor Molnar] ZOOKEEPER-236. Code review related changes: - server & client hostname verification can be set independently, - refactor defaultSSLContext to use AtomicReference, - some minor nitpicks
9ab476a [Andor Molnar] ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2 and filter suitable cipher suites
ed10e88 [Andor Molnar] ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value. Null checks
c452d1b [Andor Molnar] ZOOKEEPER-236. Fixed unit test + added some extra debug logging
88b6171 [Andor Molnar] ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
anmolnar pushed a commit to anmolnar/zookeeper that referenced this pull request Oct 8, 2018
This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming.

This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation.

Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade.

Some open questions:
- Anyone have any ideas for better names for the configuration options (`sslQuorum` and `portUnification` currently).
- I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate?
- Is port unification the correct approach for rolling upgrades? Is the impact from the use of `BufferedSocket`s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocket
- server <-> client ssl is implemented with netty. I did not feel that rewriting our server <-> server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary?

Thanks,
Abe

Author: Andor Molnar <andor@cloudera.com>
Author: Andor Molnar <andor@apache.org>

Reviewers: hanm@apache.org, andor@apache.org

Closes apache#184 from afine/ZOOKEEPER-236 and squashes the following commits:

fdcc915 [Andor Molnar] ZOOKEEPER-236. Replaced DefaultHostnameVerifier with custom impl
c014a54 [Andor Molnar] ZOOKEEPER-236. Temporary disabled portUnification support
e414496 [Andor Molnar] ZOOKEEPER-236. Nit code review fixes
209fbca [Andor Molnar] ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings
1f8aab0 [Andor Molnar] ZOOKEEPER-236. Revert portUnification/sslQuorum logic
a9fa698 [Andor Molnar] ZOOKEEPER-236. Code review fixes:
777f31a [Andor Molnar] ZOOKEEPER-236. Added Java8/Java9 default cipher suites
e8a1729 [Andor Molnar] ZOOKEEPER-236. Reverted to use single property for hostname verification
d64eb26 [Andor Molnar] ZOOKEEPER-236. Code review related changes: - server & client hostname verification can be set independently, - refactor defaultSSLContext to use AtomicReference, - some minor nitpicks
9ab476a [Andor Molnar] ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2 and filter suitable cipher suites
ed10e88 [Andor Molnar] ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value. Null checks
c452d1b [Andor Molnar] ZOOKEEPER-236. Fixed unit test + added some extra debug logging
88b6171 [Andor Molnar] ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
@ivmaykov
Copy link
Contributor

@anmolnar any ETA on merging this to master? I'd like to break up #627 into several parts with associated JIRAs and start the process on getting that code upstream, but need this code in master first.

@anmolnar anmolnar closed this Jan 30, 2019
@anmolnar anmolnar deleted the ZOOKEEPER-236 branch February 21, 2019 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants