I’m trying to access HDFS on behalf of another user. I’m trying this with the following application
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.log4j.Logger;
import org.apache.hadoop.fs.FSDataOutputStream;
import java.security.PrivilegedExceptionAction;
public class HDFSProxyTest {
public static void main (String[] args) throws Exception {
String hadoopConfigurationPath = "/etc/hadoop/conf/";
final Configuration hdfsConfiguration = new Configuration();
FileSystem localFileSystem = FileSystem.getLocal(hdfsConfiguration);
Path coreSitePath = new Path(hadoopConfigurationPath+"core-site.xml");
hdfsConfiguration.addResource(coreSitePath);
Path hdfsSitePath = new Path(hadoopConfigurationPath+"hdfs-site.xml");
hdfsConfiguration.addResource(hdfsSitePath);
UserGroupInformation.setConfiguration(hdfsConfiguration);
UserGroupInformation.loginUserFromKeytab("striim1@FCE.CLOUDERA.COM", "/home/striim/striim1_client.keytab");
UserGroupInformation ugi =
UserGroupInformation.createProxyUser("joy", UserGroupInformation.getLoginUser());
FileSystem hadoopFileSystem =ugi.doAs(new PrivilegedExceptionAction<FileSystem>() {
public FileSystem run() throws Exception {
return FileSystem.get(hdfsConfiguration);
}
});
FSDataOutputStream fsDataOutputStream = hadoopFileSystem.create(new Path("/user/striim1/hdfsproxy.csv"));
fsDataOutputStream.write("This is niranjan!!! testing this\n".getBytes());
fsDataOutputStream.close();
hadoopFileSystem.close();
}
}
Here this app execution user is striim and the super user I’m trying to emulate is striim1 who has Kerberos credentials and joy is the user on whose behalf I’m trying to access HDFS.
I end up with this exception.
2017-05-19 02:45:34,843 - WARN main org.apache.hadoop.util.NativeCodeLoader.<clinit> (NativeCodeLoader.java:62) Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=joy, access=WRITE, inode="/user/striim1":striim1:striim1:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:281)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:262)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:242)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:169)
at org.apache.sentry.hdfs.SentryAuthorizationProvider.checkPermission(SentryAuthorizationProvider.java:178)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3560)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3543)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:3525)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6592)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2821)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2739)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2624)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:599)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:112)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:401)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)
This is my configuration in core-site.xml
<property>
<name>hadoop.proxyuser.striim1.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.striim1.groups</name>
<value>*</value>
</property>
This is the permission setting of folder I’m trying to access
drwxr-xr-x - striim1 striim1 0 2017-05-19 02:50 /user/striim1
This exception leads me to the following questions
1) Even though I pass super user’s UGI to the proxy user joy. Why is the client trying to create the file in the context of user joy?
2) In my cluster deployment, “striim1” is just a user who has Kerberos credentials and not really a super-user as per this definition. Would impersonation work only if “striim1” is a superuser or added to the group of super user?
3) Should the name of the user I’m trying to impersonate be a valid OS user? If not what would happen and what validation is done in this respect?
4) What should be the permission setting of directory I’m trying to write using this impersonated user? Should be it some location that is owned or part of super-user group?
5) Should UGI.createProxyUser be called explicitly in my application? Say I execute my application from the user whom I want to impersonate using super-user and I pass proxy user configuration (basically passing core-site.xm) to my application? Would this suffice? (l’m expecting something like createProxyUser being called internally by taking current app executing user as the user to be impersonated).
Thanks in advance.
Regards,
Niranjan
1) Even though I pass super user’s UGI to the proxy user joy. Why is the client trying to create the file in the context of user joy?
When using proxy user functionality to call HDFS services like the NameNode, you authenticate as the "real user" and then the call executes as if it were done by the proxied user or "effective user". In your code sample, striim1 is the real user and joy is the effective user. This means that this client code authenticates to the NameNode using the Kerberos credentials of striim1, and then it switches over to acting as if joy really made the call. It will act as if joy is creating the file, which is significant for file permission checks as you've seen.
You might also be wondering why it acted as joy even though you called FileSystem#create
outside of the doAs
. That is because a FileSystem
instance is permanently tied to a specific UserGroupInformation
when it is created. Since you created the instance inside a doAs
running as proxy user joy, the subsequent operations on that FileSystem
keep executing as joy.
2) In my cluster deployment, “striim1” is just a user who has Kerberos credentials and not really a super-user as per this definition. Would impersonation work only if “striim1” is a superuser or added to the group of super user?
There is no requirement that the real user must be an HDFS super-user. Your setup with striim1 appears to be working fine, because it authenticated as striim1 (the real user) and then executed as joy (the effective user).
3) Should the name of the user I’m trying to impersonate be a valid OS user? If not what would happen and what validation is done in this respect?
It is not a strict requirement for the user to exist at the OS level on the server. The consequence of this is that when the NameNode executes the call, it will execute as if the user was not a member of any groups. (The group memberships are determined from OS integration, such as locally defined groups or pam_ldap.) If there is no need for the user to have group memberships to access certain files, then this won't be a problem.
4) What should be the permission setting of directory I’m trying to write using this impersonated user? Should be it some location that is owned or part of super-user group?
In your example, the call executes as if the user was joy. You are free to choose any file permission settings that meet your requirements for granting or denying access to joy. For creating a new file in a directory, the user must have execute permission on all sub-components of the path's ancestry (/, /user and /user/striim1 in your example) and write access to the immediate ancestor (/user/striim1 in your example).
For more detailed discussion of this topic, refer to the HDFS Permissions Guide.
5) Should UGI.createProxyUser be called explicitly in my application? Say I execute my application from the user whom I want to impersonate using super-user and I pass proxy user configuration (basically passing core-site.xm) to my application? Would this suffice? (l’m expecting something like createProxyUser being called internally by taking current app executing user as the user to be impersonated).
It sounds like you're looking for a solution whereby you don't need to code the application specifically for proxy user handling and can instead just control proxy user usage externally when your program is executed. If so, then you can control this by setting the HADOOP_PROXY_USER
environment variable to the user you want to impersonate. For example, you could run kinit -kt
to login as striim1, then set HADOOP_PROXY_USER=joy
and then execute your program.
See HADOOP-8561 for discussion of the implementation of this feature. Here is the point in the UserGroupInformation
code that implements this: