New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol #184
Conversation
937c39a
to
54f60f2
Compare
a192293
to
bebe096
Compare
((X509ExtendedTrustManager) tm).checkClientTrusted(x509Certificates, s, socket); | ||
public void checkClientTrusted(X509Certificate[] chain, String authType, Socket socket) throws CertificateException { | ||
if (hostnameVerificationEnabled && shouldVerifyClientHostname) { | ||
hostnameVerifier.verify(socket.getInetAddress().getHostName(), ((SSLSocket) socket).getSession()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the hostname be available on the server side to verify?. How will the server know who this client is, one way is to implement a special HostnameVerifier that can be provide a set of peer hostnames or refer back to QuorumPeer to verify using the QuorumVerifier?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the reverse dns lookup is done on the performHostnameVerification... line
implement a special HostnameVerifier that can be provide a set of peer hostnames or refer back to QuorumPeer to verify using the QuorumVerifier
I'm a little confused by this. Do you mean that the users needs to provide the mapping from IP addresses to hostnames in advance?
return new X509ExtendedTrustManager() { | ||
HostnameChecker hostnameChecker = HostnameChecker.getInstance(HostnameChecker.TYPE_TLS); | ||
X509ExtendedTrustManager x509ExtendedTrustManager = (X509ExtendedTrustManager) tm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think extending X509ExtendedTrustManager outside will help us test it better?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure yet. The createTrustManager method might give us all the exposure to this implementation that we need. I think this will become more obvious as we begin expanding the tests.
@@ -0,0 +1,231 @@ | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is supposed to be a test file, right? If so, test file should be under src/test folder. Does this even build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops... Fixed.
Security.setProperty("com.sun.security.enableCRLDP", "false"); | ||
} | ||
|
||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case if any of these test method run for a long time (which I doubt) then it'll be good to specify a timeout value annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
The QuorumSSLTest needs them too.
@@ -193,21 +198,18 @@ void request(Request request) throws IOException { | |||
/** | |||
* Returns the address of the node we think is the leader. | |||
*/ | |||
protected InetSocketAddress findLeader() { | |||
protected QuorumServer findLeader() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why findLeader is returning a QuorumServer instead of InetSocketAddress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Egah, this is a change from an earlier iteration that I forgot to remove. Fixed now.
@@ -253,7 +255,10 @@ protected void connectToLeader(InetSocketAddress addr) | |||
throw new IOException("initLimit exceeded on retries."); | |||
} | |||
|
|||
sockConnect(sock, addr, Math.min(self.tickTime * self.syncLimit, remainingInitLimitTime)); | |||
sockConnect(sock, leader.addr, Math.min(self.tickTime * self.syncLimit, remainingInitLimitTime)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leader could be null in leader.addr right? Still wondering why change the leader type from an address to a QuorumServer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed.
sock.setSoTimeout(self.tickTime * self.initLimit); | ||
protected void connectToLeader(QuorumServer leader) | ||
throws IOException, InterruptedException, X509Exception { | ||
QuorumX509Util quorumX509Util = new QuorumX509Util(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now create QuorumX509Util whenever connectToLeader is called, even if ssl is not enabled. Could this be improved so the SSL related code path is only involved when ssl is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Fixed
ivy.xml
Outdated
@@ -63,6 +67,9 @@ | |||
<dependency org="commons-collections" name="commons-collections" | |||
rev="3.2.2" conf="test->default"/> | |||
|
|||
<dependency org="org.bouncycastle" name="bcprov-jdk15on" rev="1.56" conf="test->default"/> | |||
<dependency org="org.bouncycastle" name="bcpkix-jdk15on" rev="1.56" conf="test->default"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this specified twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One is bcprov-jdk15on and the other is bcpkix-jdk15on.
this.self = self; | ||
try { | ||
if (self.getQuorumListenOnAllIPs()) { | ||
ss = new ServerSocket(self.getQuorumAddress().getPort()); | ||
if (self.shouldUsePortUnification()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to move all this code away from here to a different class like SSLSocketFactory which can help keep Leader code have similar number of branches if not more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tough call IMO. I think you are correct that moving this out of the Leader would make the leader code simpler but on the other hand this is logic that I think "belongs" to the leader and is not really relevant anywhere else.
What do you think about the compromise of moving this to a static method in Leader where it can be tested and prevents cluttering up the constructor logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some level of refactoring that makes the constructor less clogged would be valuable here..
@Override | ||
public void checkClientTrusted(X509Certificate[] chain, String authType, Socket socket) throws CertificateException { | ||
if (hostnameVerificationEnabled && shouldVerifyClientHostname) { | ||
performHostnameVerification(socket.getInetAddress().getHostName(), chain[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Quorum server connection I am not entirely sure if getInetAddres().getHostName() is a good idea when on server side this will force a reverse DNS lookup. When customer has only provided a ip address as config perhaps using hostname is not correct. And if customer has provided a hostname performing reverse dns lookup is not necessary and can be argued as not safe. Since the trust anchor is the user provided config and not the DNS service. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should not do reverse DNS lookups when we do not need to.
I changed the host verification logic to only perform a reverse dns lookup when host verification fails using the ip address. I think it is still fair to try the reverse dns lookup at that point as the value given in the altname of the certificate is not necessarily tied to the zookeeper configuration.
Let me know if you think this is reasonable.
import static org.apache.zookeeper.test.ClientBase.CONNECTION_TIMEOUT; | ||
import static org.apache.zookeeper.test.ClientBase.createTmpDir; | ||
|
||
public class QuorumSSLTest extends QuorumPeerTestBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice great stuff :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
public static final String DEFAULT_PROTOCOL = "TLSv1"; | ||
|
||
private String sslProtocolProperty = getConfigPrefix() + "protocol"; | ||
private String cipherSuitesProperty = getConfigPrefix() + "ciphersuites"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -21,6 +21,7 @@ | |||
import org.apache.zookeeper.ClientCnxn.EndOfStreamException; | |||
import org.apache.zookeeper.ClientCnxn.Packet; | |||
import org.apache.zookeeper.client.ZKClientConfig; | |||
import org.apache.zookeeper.common.ClientX509Util; | |||
import org.apache.zookeeper.common.X509Util; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -31,6 +31,7 @@ | |||
import javax.net.ssl.SSLSocket; | |||
import javax.net.ssl.SSLSocketFactory; | |||
|
|||
import org.apache.zookeeper.common.ClientX509Util; | |||
import org.apache.zookeeper.common.X509Exception.SSLContextException; | |||
import org.apache.zookeeper.common.X509Util; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
import static org.apache.zookeeper.common.X509Exception.KeyManagerException; | ||
import static org.apache.zookeeper.common.X509Exception.SSLContextException; | ||
import static org.apache.zookeeper.common.X509Exception.TrustManagerException; | ||
import static org.apache.zookeeper.common.X509Exception.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use explicit imports instead of wildcards imports - I guess here IDE tried to be clever..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -150,7 +196,7 @@ public static X509KeyManager createKeyManager(String keyStoreLocation, String ke | |||
KeyStore ks = KeyStore.getInstance("JKS"); | |||
inputStream = new FileInputStream(keyStoreFile); | |||
ks.load(inputStream, keyStorePasswordChars); | |||
KeyManagerFactory kmf = KeyManagerFactory.getInstance("SunX509"); | |||
KeyManagerFactory kmf = KeyManagerFactory.getInstance("PKIX"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any comments on what this change is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment has been added where pkix is used for the trustmanager factory. We don't actually need this change for the keymanager, but i preferred to be consistent.
|
||
private void performHostVerification(InetAddress inetAddress, X509Certificate certificate) throws CertificateException { | ||
try { | ||
hostnameVerifier.verify(inetAddress.getHostAddress(), certificate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this only verifies that the host name matches what's stored in the certificates, right? If so, how do we verify that same subject is a valid peer we intended to connect to, rather than any random server which happens to have a valid cert? Or is this already taken care of by other parts of code?
For reference, in ZOOKEEPER-1045 what we did for similar verification is by comparing the name of the server with the name retrieved from Kerberos principal. The name of the server is retrieved from the zoo.cfg and because of this we require if users want this verification they need encode the server as FQDN in zoo.cfg rather than using IP address. Maybe we can do something similar here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have a few hostname checks that we could consider. Please let me know if I am missing any.
- Check that the address/hostname of the machine matches what is listed on the certificate
- Check that the address/hostname of the machine matches what is expected from zoo.cfg (this may also require reading the sid from the packet sent when a connection is established).
- Check that the address/hostname listed on the certificate matches what is expected from zoo.cfg
This patch only performs check 1. I think that check 3 is something @geek101 mentioned earlier and we agreed this can be moved to another patch as it is more complicated.
I think it is also possible to implement check 2. This can exist entirely outside of tls/ssl support and is something we should consider based on its own merits.
I think having checks 1 and 2 would be plenty secure and we would not need check 3.
Let me know what you think @hanm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check 1 and check 2 together do what check 3 does, no? So I think 1+2 is enough. The point is to make sure that:
- The peer is what it claims to be (check 1 - authentication).
- The peer is authorized to join quorum (check 2 - authorization).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It could also be argued that authorization is handled by this patch in its current state as well, by only accepting peers that have trusted certificates. In other words, the CA limits which machines get certificates, which limits which machines get to join the quorum. So check 2 is not "needed" if the CA is trusted.
I also think that check 2 has value for deployments that want to reuse the same truststore for multiple services and for deployments that are not using SSL/TLS at all. So I think it is a check worth doing and it is deserving of its own patch. Would be interested in knowing what other members of the community think.
@@ -71,7 +71,7 @@ void followLeader() throws InterruptedException { | |||
self.end_fle = 0; | |||
fzk.registerJMX(new FollowerBean(this, zk), self.jmxLocalPeerBean); | |||
try { | |||
InetSocketAddress addr = findLeader(); | |||
InetSocketAddress addr = findLeader(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary change here - this file can be removed from the change list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -51,6 +53,8 @@ | |||
import org.slf4j.Logger; | |||
import org.slf4j.LoggerFactory; | |||
|
|||
import javax.net.ssl.SSLSocket; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these imports are not used. Another look is this file can also be removed from change list because nothing changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -383,6 +387,7 @@ public void run() { | |||
+ leader.self.getView().get(this.sid).toString()); | |||
} else { | |||
LOG.info("Follower sid: " + this.sid + " not in the current config " + Long.toHexString(leader.self.getQuorumVerifier().getVersion())); | |||
//TODO: Hostname verification possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to remove this TODO as it appertains to no code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
|
||
@Test(timeout = 300000) | ||
public void testRollingUpgrade() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is pretty flaky - please check internal Jenkins to diagnose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
d1a150d
to
08d2981
Compare
@@ -63,6 +66,10 @@ public void shutdown() { | |||
} | |||
|
|||
public static class MainThread implements Runnable { | |||
public File getConfFile() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't find a call to this getConfFile. Is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as long as we keep One issue that we found is the >10% perf regression in plaintext mode when the apache httpclient library dependency is added. We never figured out why it caused the perf regression, but it could be a potential blocker. Unlike the port unification bugs, this perf regression cannot be worked around - it was present even when all the SSL features were disabled. The fix for that is isolated to one file and basically involves copy/pasting the hostname verifier code from httpclient directly into ZK, so it would be pretty easy to backport to #184 if desired. Let me just do another pass over the differences between the two PRs and see if anything else jumps out at me. |
portUnification perf regression @hanm Please start reviewing the code if you have some free cycles. |
@eolivelli Looks like I triggered the build with my latest comment. Maybe my comment to @hanm :) |
Refer to this link for build results (access rights to CI server needed): |
@anmolnar sure, will sign this off before next monday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had one pass on code, just a few nits on styling issues, and one question about the conditions on host name verification. will have another pass on test code tomorrow.
setSockOpts(sock); | ||
sock.connect(electionAddr, cnxTO); | ||
LOG.debug("Connected to server " + sid); | ||
LOG.debug("Opening channel to server " + sid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentations on this line and the following lines
sock = new Socket(); | ||
setSockOpts(sock); | ||
sock.connect(electionAddr, cnxTO); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove extra line here.
} else { initiateConnection(sock, sid); | ||
} return true; | ||
} catch (UnresolvedAddressException e) { | ||
// Sun doesn't include the address that causes this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentations
LOG.warn("Cannot open channel to " + sid | ||
+ " at election address " + electionAddr, e); | ||
closeSocket(sock); | ||
throw e;} catch (X509Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: start a new line for } catch (X509Exception e) {
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* <p/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be <p>
(we already have a closing tag <p/>
)?
try { | ||
performHostVerification(InetAddress.getByName(engine.getPeerHost()), chain[0]); | ||
} catch (UnknownHostException e) { | ||
throw new CertificateException("failed to verify host", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Failed
hostnameVerifier.verify(hostAddress, certificate); | ||
} catch (SSLException addressVerificationException) { | ||
try { | ||
LOG.debug("Failed to verify host address: " + hostAddress + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use parameterized logging.
hostnameVerifier.verify(hostName, certificate); | ||
} catch (SSLException hostnameVerificationException) { | ||
LOG.error("Failed to verify host address: " + hostAddress, addressVerificationException); | ||
LOG.error("Failed to verify hostname: " + hostName, hostnameVerificationException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use parameterized logging.
} catch (SSLException hostnameVerificationException) { | ||
LOG.error("Failed to verify host address: " + hostAddress, addressVerificationException); | ||
LOG.error("Failed to verify hostname: " + hostName, hostnameVerificationException); | ||
throw new CertificateException("Failed to verify both host address and host name", hostnameVerificationException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be useful to log inetAddress
as well in cases where both hostName
and hostAddress
ends up empty because of exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both hostAddress
and hostName
are logged (the 2 entities which was probed). What else should be logged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the inetAddress
should be logged to provide additional information when both hostAddress
and hostName
fails to resolve.
} else if (key.equals("sslQuorum")){ | ||
sslQuorum = Boolean.parseBoolean(value); | ||
} else if (key.equals("portUnification")){ | ||
shouldUsePortUnification = Boolean.parseBoolean(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we need to disable the parsing code here for this patch, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @hanm - comment out these two lines, and add a comment:
// TODO: UnifiedServerSocket is currently buggy, will be fixed when @ivmaykov's PRs are merged. Disable port unification until then.
@anmolnar you just need to copy |
build.xml
Outdated
@@ -38,11 +38,15 @@ xmlns:cs="antlib:com.puppycrawl.tools.checkstyle.ant"> | |||
|
|||
<property name="netty.version" value="3.10.6.Final"/> | |||
|
|||
<property name="httpcomponents.version" value="4.5.3"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, let's remove this dependency.
ivy.xml
Outdated
@@ -63,6 +63,10 @@ | |||
<artifact name="netty" type="jar" conf="default"/> | |||
</dependency> | |||
|
|||
<dependency org="org.apache.httpcomponents" name="httpclient" rev="${httpcomponents.version}"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
* We attempt to perform verification using just the IP address first and if that fails will attempt to perform a | ||
* reverse DNS lookup and verify using the hostname. | ||
*/ | ||
public class ZKTrustManager extends X509ExtendedTrustManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace the entire contents of this file with the same file from #627. It's the same code, but the org.apache.http.conn.ssl.DefaultHostnameVerifier
class is copied into the file as a private inner class and renamed to ZKHostnameVerifier
. Two other apache utility classes from httpclient (SubjectName
and InetAddressUtils
) are also copied in.
This resolved a >10% perf regression in our internal testing at FB. We don't actually know why including httpclient caused such a large perf regression (and strangely, the regression was present even with SSL configs disabled), but it's a pretty easy fix ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit strange to me (copy-pasting code from other project to remove dependency) and needs additional review, but I don't want to block this PR really.
Have you checked the latest version of HttpClient
4.5.6? I've seen a number of bugfixes included in the maintenance releases. Might help.
@hanm what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't checked against 4.5.6, but given that the code path wasn't even being taken I doubt a later version would help. I don't know enough about JVM performance quirks to even have a theory for how including an unused jar leads to a perf regression. Maybe something to do with the class loader?
Anyway, including a fully featured HTTP client library in Zookeeper's server code base seems weird in any case, since ZK doesn't actually use the HTTP client functionality for anything. If you don't want to copy-paste code, do you know of some smaller open source library that just provides the hostname verification that we could use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a dependency on Hadoop, so we would need to copy-and-paste this class too. Maybe your original approach is the quickest way to get this done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am +1 on @ivmaykov 's approach.
We should create a JIRA and put a comment in code links with JIRA so this can be cleaned up later, though this sounds a mysterious issue by the description so far (that performance degrades with new code path not hit?).
} else if (key.equals("sslQuorum")){ | ||
sslQuorum = Boolean.parseBoolean(value); | ||
} else if (key.equals("portUnification")){ | ||
shouldUsePortUnification = Boolean.parseBoolean(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @hanm - comment out these two lines, and add a comment:
// TODO: UnifiedServerSocket is currently buggy, will be fixed when @ivmaykov's PRs are merged. Disable port unification until then.
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
I'm not a committer so my +1 doesn't count for much, but this PR looks good to me. I think we should get it in ASAP and then try to get my changes in (as several separate PRs, I will need to break them up) hopefully before 3.5 ships so people can have PEM support, cert reloading, and port unification. |
It counts a lot, thanks for spending time not just review the patch but also test it in production! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with latest changes.
@anmolnar are there any remaining works you plan to do on this PR? I think we are good to land but I'd like to double check before I commit this. thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 no more plans, good to go.
I'll commit just to make things faster. (not my PR anyway :)
This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming. This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation. Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade. Some open questions: - Anyone have any ideas for better names for the configuration options (`sslQuorum` and `portUnification` currently). - I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate? - Is port unification the correct approach for rolling upgrades? Is the impact from the use of `BufferedSocket`s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocket - server <-> client ssl is implemented with netty. I did not feel that rewriting our server <-> server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary? Thanks, Abe Author: Andor Molnar <andor@cloudera.com> Author: Andor Molnar <andor@apache.org> Reviewers: hanm@apache.org, andor@apache.org Closes #184 from afine/ZOOKEEPER-236 and squashes the following commits: fdcc915 [Andor Molnar] ZOOKEEPER-236. Replaced DefaultHostnameVerifier with custom impl c014a54 [Andor Molnar] ZOOKEEPER-236. Temporary disabled portUnification support e414496 [Andor Molnar] ZOOKEEPER-236. Nit code review fixes 209fbca [Andor Molnar] ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings 1f8aab0 [Andor Molnar] ZOOKEEPER-236. Revert portUnification/sslQuorum logic a9fa698 [Andor Molnar] ZOOKEEPER-236. Code review fixes: 777f31a [Andor Molnar] ZOOKEEPER-236. Added Java8/Java9 default cipher suites e8a1729 [Andor Molnar] ZOOKEEPER-236. Reverted to use single property for hostname verification d64eb26 [Andor Molnar] ZOOKEEPER-236. Code review related changes: - server & client hostname verification can be set independently, - refactor defaultSSLContext to use AtomicReference, - some minor nitpicks 9ab476a [Andor Molnar] ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2 and filter suitable cipher suites ed10e88 [Andor Molnar] ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value. Null checks c452d1b [Andor Molnar] ZOOKEEPER-236. Fixed unit test + added some extra debug logging 88b6171 [Andor Molnar] ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming. This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation. Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade. Some open questions: - Anyone have any ideas for better names for the configuration options (`sslQuorum` and `portUnification` currently). - I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate? - Is port unification the correct approach for rolling upgrades? Is the impact from the use of `BufferedSocket`s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocket - server <-> client ssl is implemented with netty. I did not feel that rewriting our server <-> server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary? Thanks, Abe Author: Andor Molnar <andor@cloudera.com> Author: Andor Molnar <andor@apache.org> Reviewers: hanm@apache.org, andor@apache.org Closes apache#184 from afine/ZOOKEEPER-236 and squashes the following commits: fdcc915 [Andor Molnar] ZOOKEEPER-236. Replaced DefaultHostnameVerifier with custom impl c014a54 [Andor Molnar] ZOOKEEPER-236. Temporary disabled portUnification support e414496 [Andor Molnar] ZOOKEEPER-236. Nit code review fixes 209fbca [Andor Molnar] ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings 1f8aab0 [Andor Molnar] ZOOKEEPER-236. Revert portUnification/sslQuorum logic a9fa698 [Andor Molnar] ZOOKEEPER-236. Code review fixes: 777f31a [Andor Molnar] ZOOKEEPER-236. Added Java8/Java9 default cipher suites e8a1729 [Andor Molnar] ZOOKEEPER-236. Reverted to use single property for hostname verification d64eb26 [Andor Molnar] ZOOKEEPER-236. Code review related changes: - server & client hostname verification can be set independently, - refactor defaultSSLContext to use AtomicReference, - some minor nitpicks 9ab476a [Andor Molnar] ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2 and filter suitable cipher suites ed10e88 [Andor Molnar] ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value. Null checks c452d1b [Andor Molnar] ZOOKEEPER-236. Fixed unit test + added some extra debug logging 88b6171 [Andor Molnar] ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming. This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation. Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade. Some open questions: - Anyone have any ideas for better names for the configuration options (`sslQuorum` and `portUnification` currently). - I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate? - Is port unification the correct approach for rolling upgrades? Is the impact from the use of `BufferedSocket`s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocket - server <-> client ssl is implemented with netty. I did not feel that rewriting our server <-> server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary? Thanks, Abe Author: Andor Molnar <andor@cloudera.com> Author: Andor Molnar <andor@apache.org> Reviewers: hanm@apache.org, andor@apache.org Closes apache#184 from afine/ZOOKEEPER-236 and squashes the following commits: fdcc915 [Andor Molnar] ZOOKEEPER-236. Replaced DefaultHostnameVerifier with custom impl c014a54 [Andor Molnar] ZOOKEEPER-236. Temporary disabled portUnification support e414496 [Andor Molnar] ZOOKEEPER-236. Nit code review fixes 209fbca [Andor Molnar] ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings 1f8aab0 [Andor Molnar] ZOOKEEPER-236. Revert portUnification/sslQuorum logic a9fa698 [Andor Molnar] ZOOKEEPER-236. Code review fixes: 777f31a [Andor Molnar] ZOOKEEPER-236. Added Java8/Java9 default cipher suites e8a1729 [Andor Molnar] ZOOKEEPER-236. Reverted to use single property for hostname verification d64eb26 [Andor Molnar] ZOOKEEPER-236. Code review related changes: - server & client hostname verification can be set independently, - refactor defaultSSLContext to use AtomicReference, - some minor nitpicks 9ab476a [Andor Molnar] ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2 and filter suitable cipher suites ed10e88 [Andor Molnar] ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value. Null checks c452d1b [Andor Molnar] ZOOKEEPER-236. Fixed unit test + added some extra debug logging 88b6171 [Andor Molnar] ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming. This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation. Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade. Some open questions: - Anyone have any ideas for better names for the configuration options (`sslQuorum` and `portUnification` currently). - I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate? - Is port unification the correct approach for rolling upgrades? Is the impact from the use of `BufferedSocket`s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocket - server <-> client ssl is implemented with netty. I did not feel that rewriting our server <-> server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary? Thanks, Abe Author: Andor Molnar <andor@cloudera.com> Author: Andor Molnar <andor@apache.org> Reviewers: hanm@apache.org, andor@apache.org Closes apache#184 from afine/ZOOKEEPER-236 and squashes the following commits: fdcc915 [Andor Molnar] ZOOKEEPER-236. Replaced DefaultHostnameVerifier with custom impl c014a54 [Andor Molnar] ZOOKEEPER-236. Temporary disabled portUnification support e414496 [Andor Molnar] ZOOKEEPER-236. Nit code review fixes 209fbca [Andor Molnar] ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings 1f8aab0 [Andor Molnar] ZOOKEEPER-236. Revert portUnification/sslQuorum logic a9fa698 [Andor Molnar] ZOOKEEPER-236. Code review fixes: 777f31a [Andor Molnar] ZOOKEEPER-236. Added Java8/Java9 default cipher suites e8a1729 [Andor Molnar] ZOOKEEPER-236. Reverted to use single property for hostname verification d64eb26 [Andor Molnar] ZOOKEEPER-236. Code review related changes: - server & client hostname verification can be set independently, - refactor defaultSSLContext to use AtomicReference, - some minor nitpicks 9ab476a [Andor Molnar] ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2 and filter suitable cipher suites ed10e88 [Andor Molnar] ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value. Null checks c452d1b [Andor Molnar] ZOOKEEPER-236. Fixed unit test + added some extra debug logging 88b6171 [Andor Molnar] ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming. This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation. Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade. Some open questions: - Anyone have any ideas for better names for the configuration options (`sslQuorum` and `portUnification` currently). - I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate? - Is port unification the correct approach for rolling upgrades? Is the impact from the use of `BufferedSocket`s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocket - server <-> client ssl is implemented with netty. I did not feel that rewriting our server <-> server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary? Thanks, Abe Author: Andor Molnar <andor@cloudera.com> Author: Andor Molnar <andor@apache.org> Reviewers: hanm@apache.org, andor@apache.org Closes apache#184 from afine/ZOOKEEPER-236 and squashes the following commits: fdcc915 [Andor Molnar] ZOOKEEPER-236. Replaced DefaultHostnameVerifier with custom impl c014a54 [Andor Molnar] ZOOKEEPER-236. Temporary disabled portUnification support e414496 [Andor Molnar] ZOOKEEPER-236. Nit code review fixes 209fbca [Andor Molnar] ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings 1f8aab0 [Andor Molnar] ZOOKEEPER-236. Revert portUnification/sslQuorum logic a9fa698 [Andor Molnar] ZOOKEEPER-236. Code review fixes: 777f31a [Andor Molnar] ZOOKEEPER-236. Added Java8/Java9 default cipher suites e8a1729 [Andor Molnar] ZOOKEEPER-236. Reverted to use single property for hostname verification d64eb26 [Andor Molnar] ZOOKEEPER-236. Code review related changes: - server & client hostname verification can be set independently, - refactor defaultSSLContext to use AtomicReference, - some minor nitpicks 9ab476a [Andor Molnar] ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2 and filter suitable cipher suites ed10e88 [Andor Molnar] ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value. Null checks c452d1b [Andor Molnar] ZOOKEEPER-236. Fixed unit test + added some extra debug logging 88b6171 [Andor Molnar] ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming. This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation. Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade. Some open questions: - Anyone have any ideas for better names for the configuration options (`sslQuorum` and `portUnification` currently). - I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate? - Is port unification the correct approach for rolling upgrades? Is the impact from the use of `BufferedSocket`s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocket - server <-> client ssl is implemented with netty. I did not feel that rewriting our server <-> server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary? Thanks, Abe Author: Andor Molnar <andor@cloudera.com> Author: Andor Molnar <andor@apache.org> Reviewers: hanm@apache.org, andor@apache.org Closes apache#184 from afine/ZOOKEEPER-236 and squashes the following commits: fdcc915 [Andor Molnar] ZOOKEEPER-236. Replaced DefaultHostnameVerifier with custom impl c014a54 [Andor Molnar] ZOOKEEPER-236. Temporary disabled portUnification support e414496 [Andor Molnar] ZOOKEEPER-236. Nit code review fixes 209fbca [Andor Molnar] ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings 1f8aab0 [Andor Molnar] ZOOKEEPER-236. Revert portUnification/sslQuorum logic a9fa698 [Andor Molnar] ZOOKEEPER-236. Code review fixes: 777f31a [Andor Molnar] ZOOKEEPER-236. Added Java8/Java9 default cipher suites e8a1729 [Andor Molnar] ZOOKEEPER-236. Reverted to use single property for hostname verification d64eb26 [Andor Molnar] ZOOKEEPER-236. Code review related changes: - server & client hostname verification can be set independently, - refactor defaultSSLContext to use AtomicReference, - some minor nitpicks 9ab476a [Andor Molnar] ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2 and filter suitable cipher suites ed10e88 [Andor Molnar] ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value. Null checks c452d1b [Andor Molnar] ZOOKEEPER-236. Fixed unit test + added some extra debug logging 88b6171 [Andor Molnar] ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming.
This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation.
Rolling upgrades are supported via port unification (
portUnification=true
). This should only be used while performing a rolling upgrade.Some open questions:
sslQuorum
andportUnification
currently).BufferedSocket
s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocketSSLSocket
s. Any arguments to the contrary?Thanks,
Abe