Data Solution 2019(6)MySQL Data Source

DataSolution2019(6)MySQLDataSource

Makesureourconnectiontodatabaseisgood

>grantallprivilegesondatabase.*toroot@‘142.xxx.xxx.xxx'identifiedby‘xxxxxx';

>flushprivileges;

InmyZeppelinNotebook

Icanusethistoloadthedependencies

%spark.dep

z.load("mysql:mysql-connector-java:5.1.47”)

ConnecttotheTableandDatabase

valhomeAdvisorCompanysRawDF=sqlContext.read

.format("jdbc")

.option("driver","com.mysql.jdbc.Driver")

.option("url","jdbc:mysql://45.55.xx.xx:3306/sillycat_services")

.option("user","root")

.option("password",“xxxxxx")

.option("dbtable","copy_home_companys")

.load()

homeAdvisorCompanysRawDF.printSchema()

homeAdvisorCompanysRawDF.registerTempTable("homeadvisorcompanys")

UsethefunctionwithinonparameterMethod

valcheckPhone:(String=>Int)=(phone:String)=>{

valregexStr="^(1\\-)?[0-9]{3}\\-?[0-9]{3}\\-?[0-9]{4}$"

if(phone.matches(regexStr)){

20

}else{

10

}

}

valcheckPhoneColumn=udf(checkPhone)

valphoneDF=homeAdvisorCompanysRawDF.withColumn("phoneScore",checkPhoneColumn(homeAdvisorCompanysRawDF("phone")))

phoneDF.select("phone","phoneScore").show(2)

UsingtheFunctionwithmultipleparameters

valcheckAddress=(location:String,street_address:String,address_locality:String,address_region:String,postal_code:String)=>{

if(location!=null&&!location.isEmpty()&&postal_code!=null&&!postal_code.isEmpty()){

20

}else{

10

}

}

valcheckAddressColumn=udf(checkAddress)

valaddressDF=phoneDF.withColumn("addressScore",checkAddressColumn(phoneDF("location"),phoneDF("street_address"),phoneDF("address_locality"),phoneDF("address_region"),phoneDF("postal_code")))

addressDF.select("phone","phoneScore","location","postal_code","addressScore").show(2)

SumupalltherelatedcolumnsandgetatotalScore

valcolumnsToSum=List(col("phoneScore"),col("addressScore"))

valresultDF=addressDF.withColumn("totalScore",columnsToSum.reduce(_+_))

resultDF.select("phone","phoneScore","location","postal_code","addressScore","totalScore").show(2);

References:

https://mvnrepository.com/artifact/mysql/mysql-connector-java/5.1.47

https://zeppelin.apache.org/docs/latest/interpreter/spark.html

相关推荐