2019年6月16日

julia Data Visulization

分成三個部分: Basic Plots, Vega, Gadfly,討論如何在 julia 進行 data visulization

Basic plots

使用 PyPlot,這是以 python matplotlib.pyplot module 提供的功能

使用前要先安裝 matplotlib

python -m pip install matplotlib

在 julia 安裝 PyPlot

 Pkg.add("PyPlot")

用以下指令在 julia 測試 PyPlot

using PyPlot
x = 1:100
y = rand(100)
p = PyPlot.plot(x,y)
xlabel("x")
ylabel("y")
title("basic plot")
grid("true")

會得到這樣的圖形結果

另一個例子

using PyPlot

x = range(0, stop=4pi, length=1000)
y = cos.(pi .+ sin.(x))

xlabel("x-axis")
ylabel("y-axis")
title("using sin and cos functions")

plot(x, y, color="red")

XKCD 是一種 casual-style, handwritten graph mode

using PyPlot

x = [1:1:10;]
y = ones(10)

for i = 1:1:10
    y[i] = pi + i*i
end

xkcd()
xlabel("x-axis")
ylabel("y-axis")
title("XKCD")
plot(x,y)

bar chart

using PyPlot

x = [10,20,30,40,50]
y = [2,4,6,8,10]
xlabel("x-axis")
ylabel("y-axis")
title("Vertical bar graph")
bar(x, y, color="red")

horizontal bar chart

clf()
x = [10,20,30,40,50]
y = [2,4,6,8,10]
title("Horizontal bar graph")
xlabel("x-axis")
ylabel("y-axis")
barh(x,y,color="red")

2D histogram

clf()
x = rand(1000)
y = rand(1000)
xlabel("x-axis")
ylabel("y-axis")
title("2D Histograph")
hist2D(x, y, bins=50)

pie chart

clf()
labels = ["Fruits";"Vegetables";"Wheat"]
colors = ["Orange";"Blue";"Red"]
sizes = [25;40;35]
explode = zeros(length(sizes))
fig = figure("piechart", figsize=(10,10))
p = pie(sizes, labels=labels, shadow=true, startangle=90, colors = colors)
title("Pie charts")

Scatter chart

clf()
fig = figure("scatterplot", figsize = (10,10))
x = rand(50)
y = rand(50)
areas = 1000*rand(50);
scatter(x, y, s=areas, alpha=0.5)
xlabel("x-axis")
ylabel("y-axis")
title("Scatter Plot")

PyPlot 的 3D plot 是使用 surf(x, y, z, facecolors=colors)

參數 說明
X,Y,Z Data values as 2D arrays
rstride Array row stride (step size)
cstride Array column stride (step size)
rcount Use at most this many rows, defaults to 50
ccount Use at most this many columns, defaults to 50
color Color of the surface patches
cmap A colormap for the surface patches.
facecolors Face colors for the individual patches
norm An instance of Normalize to map values to colors
vmin Minimum value to map
vmax Maximum value to map
shade Whether to shade the facecolors
using PyPlot

clf()
a = range(0.0, stop=2pi, length=500)
b = range(0.0, stop=2pi, length=500)

len_a = length(a)
len_b = length(b)

x = ones(len_a, len_b)
y = ones(len_a, len_b)
z = ones(len_a, len_b)

for i=1:len_a
    for j=1:len_b
        x[i,j] = sin(a[i])
        y[i,j] = cos(a[i])
        z[i,j] = sin(b[j])
    end
end

colors = rand(len_a, len_b, 3)
fig = figure()
surf(x, y, z, facecolors=colors)
fig[:canvas][:draw]()

Gadfly

這是一個圖形的 library,可以輸出圖片為 SVG, PNG, PostScript, PDF,也可用 IJulia 運作,跟 DataFrames 緊密整合,提供 pan, zoom, toggle 的功能。執行 Gadfly.plot 後,browser 會打開一個 html 檔案,裡面是 svg 圖片。

Pkg.add("Gadfly")
using Gadfly
Gadfly.plot(x = rand(10), y=rand(10))

# 折線圖
Gadfly.plot(x = rand(10),y=rand(10), Geom.point, Geom.line)
Gadfly.plot(x=1:10, y=[10^n for n in rand(10)], Scale.y_sqrt, Geom.point, Geom.smooth, Guide.xlabel("x"), Guide.ylabel("y"), Guide.title("Graph with labels"))


Plotting DataFrames with Gadfly

使用 RDatasets (有一些範例資料) 產生 DataFrame for the plot function

折線圖

using RDatasets
Gadfly.plot(dataset("datasets", "iris"),
        x="SepalLength",
        y="SepalWidth",
        Geom.line)

Point Plot

Gadfly.plot(dataset("datasets", "iris"),
        x="SepalLength",
        y="SepalWidth",
        Geom.point)

plot a graph between SepalLength and SepalWidth

histogram

Gadfly.plot(x=randn(4000), Geom.histogram(bincount=100))

preceding showcased histogram

Gadfly.plot(dataset("mlmRev", "Gcsemv"),
        x = "Course", color="Gender", Geom.histogram)

References

Learning Julia

2019年6月15日

Swift 5 語言新特性簡述

Swift 5 在2019年3月25日正式release,主要最廣為人討論的就是ABI Stability,也就是Swift runtime將被放在作業系統裡而不是包含在App裡,更進一步的說明可以參考ABI Stability and MoreEvolving Swift On Apple Platforms After ABI Stability

此外,Swift 5語言本身也增加了一些新的特性,包含新的語法及標準函式庫的更新。本篇文章將概略說明部分較常被提出的新特性。關於完整的Swift 5更新內容請參考Swift部落格的這篇文章:Swift 5 Released!

Raw Strings SE-0200

在字串前後加上#符號,即可使用raw string
在raw string中,反斜線雙引號都可以直接使用,不再需要額外的跳脫字元

let rawStr = #"Hello, \ My "dear" friend!"#
print(rawStr) 

印出結果:

Hello, \ My "dear" friend!

若依然要使用跳脫字元,使用反斜線加上#符號:

let rawStr = #"Hello,\#nMy friend!"#
print(rawStr)

如此才可印出有換行的結果:

Hello,
My friend!

你可以改在字串前後加上兩個或更多個#符號,使用raw string。如此一來,即可改在字串中直接輸入"#;此外,跳脫字元變也為\##

let rawStr = ##"Hello,\##n Now yout can type "# directly!"##
print(rawStr) 

印出結果:

Hello,
Now yout can type "# directly!

Customizing string Interpolation SE-0228

String Interpolation是swift在字串中放入變數的方式之一,如下:

let name = "myitem"
let weight = 123
print("The item is \(name). Its weight is \(weight).") 
//印出The item is myitem. Its weight is 123.

然而,若是自訂的類別物件,在以往,需要令該類別實作CustomStringConvertibledescription property,才能印出自訂的內容;類似Java的toString。

而現在,在Swift 5中,可以使用extension替String.StringInterpolation增加新的appendInterpolation方法,以處理自訂的類別。

class MyItem {
    let name = "item name"
    let weight = 123 
    init(name:String, weight:Int) {
        self.name = name
        self.weight = weight
    }
}

extension String.StringInterpolation {
    mutating func appendInterpolation(_ value: MyItem) {
        appendInterpolation("The item is \(value.name). Its weight is \(value.weight).")
    }
}

let myitem = MyItem(name:"item name", weight:123)
print("myitem:\(myitem)")
//印出myitem:The item is myitem. Its weight is 123.

此外,在增加新的appendInterpolation方法時,
你也可以使用多個參數,以及使用argument label,就像自訂函數一樣,進一步客製想要的String Interpolation內容:

extension String.StringInterpolation {
    mutating func appendInterpolation(mystring: String, replace oldString:String, with newString:String) {
        let newstr = mystring.replacingOccurrences(of: oldString, with: newString)
        appendInterpolation(newstr)
    }
}

print("test:\(mystring:"aaabbbccc", replace:"bbb", with:"ddd")")
//印出test:aaadddccc

Standard Result type SE-0235

Swift 5 標準函式庫中加入了Result型別,讓程式開發者可以更一致地處理錯誤。

Result是一個enum,有successfailure兩個case,並分別各有一個associated value,分別代表執行成功時欲回傳的資料,以及執行失敗時發生的錯誤。

public enum Result<Success, Failure: Error> {
    case success(Success), failure(Failure)
}

假設我們使用要定義一個取得新聞資料的API,並使用Result表示結果,可以定義如下:

func fetchNews(from urlString: String, completionHandler: @escaping (Result<MyNews, MyNewsError>) -> Void) {
    guard let url = URL(string: urlString) else {
        //發生錯誤時回傳.failure及錯誤內容
        completionHandler(.failure(.networkError))
        return
    }

    var mynews = MyNews()
    // ...
    
    //正確執行時,回傳.success及資料
    completionHandler(.success(mynews))
}

MyNews是某個自訂的類別;MyNewsError則是自訂的enum,繼承自標準函式庫的Error。

class MyNews {
    //...
}

enum MyNewsError: Error {
    case networkError
}

如此一來,在呼叫fetchNews取新聞資料時,可以用switch case以更直覺簡單地分別處理成功與失敗的結果:

fetchNews(from: "https://www.xxx.com") { result in
    switch result {
    case .success(let mynews):
        // ...
    case .failure(let myerror):
        // ...
    }
}

Dynamically callable types SE-0216

dynamicCallable可以讓你定義一個型別,其宣告出來的變數可以如同函式般直接使用,如下:

@dynamicCallable
struct MyMultiplier {
    private num:Int
    init(num:Int) {
        self.num = num;
    }
    func dynamicallyCall(withArguments args: [Int]) -> Int {
        //將陣列中的所有元素乘以10後相加。
        return args.map { $0 * num }.reduce(0,+);
    }
}

建立MyMultiplier型別的物件,並令其當作函式來呼叫:

let m10 = MyMultiplier(num:10)
m10(1,2,3,4,5) //150

如上,該型別必須實作dynamicallyCall方法,而參數可以是withArguments任意型別的陣列,如上例為Int陣列。

此外,也可以改用withKeywordArguments,並把參數改為KeyValuePairs<String, Any>,讓函式可以取得傳入的參數名稱。

@dynamicCallable
struct MyFunc {
    func dynamicallyCall(withKeywordArguments args: KeyValuePairs<String, Any>) {
    //...        
    }
}

let myfunc = MyFunc()
myfunc(name:"test name", count:123)

dynamicCallable可以用於struct, class, enum

Handling Future Enumeration Cases SE-0192

當switch述句中的條件判斷若其值來自非自定義的enum,如:C enums或是來自standard library的enum,則意味著此switch述句的條件判斷值,在未來可能需要處理非預期的內容;而通常在這種情況下,我們一定會用到default來處理。

以下以來自standard library的enum: AVAudioSession.InterruptionType為例。

Standard library中AVAudioSession.InterruptionType的定義如下:

    public enum InterruptionType : UInt {

        case began

        case ended
    }

假設我們有一個switch述句需要針對AVAudioSession.InterruptionType進行處理:

//判斷AudioInterruption的狀態並印出log...
func logAudioInterruptionType(inttType:AVAudioSession.InterruptionType) {
    switch inttType {
    case .began:
        print("began:\(inttType)")
    case .ended:
        print("ended:\(inttType)")
}

由於AVAudioSession.InterruptionType是來自standard library的enum,未來有可能會增加新的值;所以編譯器提出警告:

Switch covers known cases, but 'AVAudioSession.InterruptionType' may have additional unknown values, possibly added in future versions

Handle unknown values using "@unknown default"

使用 @unknown 標註,暗示未來隨著標準函式庫的更新,此switch case有可能會遇到其他未知的enum類型:

func logAudioInterruptionType(inttType:AVAudioSession.InterruptionType) {
    switch inttType {
    case .began:
        print("began:\(inttType)")
    case .ended:
        print("ended:\(inttType)")
        
    @unknown default:
        print("unknown type:\(inttType)")
    }
}

為何不直接寫上default?

如果直接寫個default,雖然也不會出現編譯錯誤,

func logAudioInterruptionType(inttType:AVAudioSession.InterruptionType) {
    switch inttType {
    case .began:
        print("began:\(inttType)")
    case .ended:
        print("ended:\(inttType)")
        
    default:  //不太合理,因為目前看來不可能...
        print("unknown type:\(inttType)")
    }
}

//備註:在Swift語言中,switch裡的每個case執行完就會自動跳出,不需要像C語言一樣使用break來防止繼續往下執行到其他case

但就邏輯上來說,這是令人困惑的,
因為就現況而言,AVAudioSession.InterruptionType確實只有 .began.ended 兩種值;
若是自定義的enum,而你的 switch case 也已明確地處理好所有enum值的話,再加上default,編譯器甚至還會回報Warning,請你將這個累贅的 default陳述句 移除:

Default will never be executed

一般的default應該用於如下情境,switch case中尚有值仍未處理的情況;

public enum MyGreetings {

    case .goodmorning

    case .goodnight
}

func printMyGreetings(greetingType:MyGreetings) {
    switch greetingType {
    case .goodmorning:
        print("Good Morning")
    default:  //合理,因為還有goodnight沒有處理到...
        print("Good night...I think...")
}

而加上 @unknown 則意味著,此處的 default 可能目前不會遇到,但是在往後有可能會遇到的,所以通常會是來自 非自定義的enum 或是C enums

Proposal SE-0192 就是希望能在程式中刻意區分這一點而提出,也才有了用 @unknown 來處理future enumeration的這個變動。

This proposal aims to distinguish between enums that are frozen (meaning they will never get any new cases) and those that are non-frozen, and to ensure that clients handle any future cases when dealing with the latter.

Flatten nested optionals resulting from ‘try?’ SE-0230

因爲try而造成的nested optionals,意即如:String?? 這樣有兩個 ?? 的情況,在swift 5之後也會變為普通的optional。如:

class MyItem {
    init?(itemid:String) {
        //假設init可能會因為不適合的itemid而建立失敗
    }
    
    //回傳String,但可能拋出錯誤
    func getItemInfo() throws -> String {
        ///...
    }
    
    //回傳String?
    func getItemDesc() -> String? {
        ///...
    }
}

//getItemInfo可能拋出錯誤,需使用try處理
let myItemInfo = try? MyItem(itemid:"aaa")?.getItemInfo()

在swift 5 之前,myItemInfo為String??

在swift 5 之後,myItemInfo為String?

這個改變是因為,原本多數的 optional 使用情境,都會避免發生 nested optionals,如下:

//getItemDesc方法不會拋出錯誤,而是回傳Optional String
let myItemDesc = MyItem(itemid:"aaa")?.getItemDesc()

上面程式中的myItemDesc為String?

不會因為 initializer 回傳 optional 型態,而getItemDesc方法也回傳 optional 型態,
進而造成有兩個 ??nested optionals

因此,於Swift 5中,為了使 try? 與其他 optional 的使用情境更一致,而做了此更動。

Reference

https://swift.org/blog/swift-5-released/

What’s new in Swift 5.0 – Hacking with Swift

What’s New in Swift 5? | raywenderlich.com

https://developer.apple.com/documentation/xcode_release_notes/xcode_10_2_release_notes/swift_5_release_notes_for_xcode_10_2

2019年6月10日

Numerical and Scientific Computation in Julia

julia 提供處理大量資料,以及進行統計運算的 library

Working with data

首先是讀取資料的方式,如果是 binary 資料,可直接用 read(), write()

julia> for val in [:Bool, :Char, :Int8]
         @eval println(@which read(stdin, $val))
       end
read(s::IO, ::Type{Bool}) in Base at io.jl:593
read(io::IO, ::Type{Char}) in Base at io.jl:613
read(s::IO, ::Type{Int8}) in Base at io.jl:588

使用

julia> read(stdin, Char)
j
'j': ASCII/Unicode U+006a (category Ll: Letter, lowercase)

julia> read(stdin, Bool)
true
true

julia> read(stdin, Int8)
12
49

readline

julia> methods(readline)
# 4 methods for generic function "readline":
[1] readline(s::IOStream; keep) in Base at iostream.jl:433
[2] readline() in Base at io.jl:370
[3] readline(filename::AbstractString; keep) in Base at io.jl:364
[4] readline(s::IO; keep) in Base at io.jl:370

使用

julia> number = parse(Int64, readline())
12
12

julia> println(number)
12

working with text files

open 參數除了 filename,後面還有存取該檔案的 mode

julia> methods(open)
# 8 methods for generic function "open":
[1] open(fname::AbstractString; read, write, create, truncate, append) in Base at iostream.jl:275
[2] open(fname::AbstractString, mode::AbstractString) in Base at iostream.jl:339
[3] open(f::Function, cmds::Base.AbstractCmd, args...) in Base at process.jl:615
[4] open(f::Function, args...; kwargs...) in Base at iostream.jl:367
[5] open(cmds::Base.AbstractCmd) in Base at process.jl:584
[6] open(cmds::Base.AbstractCmd, mode::AbstractString) in Base at process.jl:562
[7] open(cmds::Base.AbstractCmd, mode::AbstractString, other::Union{RawFD, FileRedirect, IO}) in Base at process.jl:562
[8] open(cmds::Base.AbstractCmd, other::Union{RawFD, FileRedirect, IO}; write, read) in Base at process.jl:584
Mode Description
r read
r+ read, write
w write, create, truncate
w+ read, write, create, truncate
a write, create, append
a+ read, write, create, append
# 有個文字檔 sample.txt
shell> cat sample.txt

file = open("sample.txt")

# 以行為單位,讀入文字檔
file_data = readlines(file)

enumerate(file_data)

# 加上行號
for lines in enumerate(file_data)
   println(lines[1],"-> ", lines[2])
end

# 大寫列印
for line in file_data
   println(uppercase(line))
end

# 反過來列印
for line in file_data
   println(reverse(line))
end

# 計算行數
countlines("sample.txt")

# 列印第一行
first(file_data)

# 最後一行
last(file_data)
working with CSV
using DelimitedFiles

csvfile = readdlm("sample.csv", ',', String, '\n')

# 只取得第二欄資料
csvfile[:,2]

# 只取前三行資料
csvfile[1:3,:]

# 以 | 區隔資料的 sample.psv
# "Lora"|"UK"
# "James"|"UK"
readdlm("sample.psv",'|')
working with DataFrames

DataFrame 是一種類似 database 的資料結構,可以用類似 SQL 的方式存取 DataFrames

using DataFrames

# 取得所有 DataFrames 支援的 functions
names(DataFrames)

DataFrame: 2D data structure,類似 R 與 Pandas 的 DataFrame

  • DataArray: 比 julia 預設 Array type 提供更多 comparison 的功能,但目前已經 deprecated (ref: https://github.com/JuliaStats/DataArrays.jl)
  • DataFrame: 2D data structure,類似 R 與 Pandas 的 DataFrame

julia 的 Array 以 nothing 代表 missing value

julia> a = [1,2,3,nothing,5,6]
6-element Array{Union{Nothing, Int64},1}:
 1
 2
 3
  nothing
 5
 6

julia> typeof(a[4])
Nothing

DataFrame: https://juliadata.github.io/DataFrames.jl/stable/man/getting_started.html

julia> dframe = DataFrame(Names = ["John","May"], Age = [27,28])
2×2 DataFrame
│ Row │ Names  │ Age   │
│     │ String │ Int64 │
├─────┼────────┼───────┤
│ 1   │ John   │ 27    │
│ 2   │ May    │ 28    │

julia> dframe.Names
2-element Array{String,1}:
 "John"
 "May"

julia> dframe.Age
2-element Array{Int64,1}:
 27
 28

julia> names(dframe)
2-element Array{Symbol,1}:
 :Names
 :Age

julia> describe(dframe)
2×8 DataFrame
│ Row │ variable │ mean   │ min  │ median │ max │ nunique │ nmissing │ eltype   │
│     │ Symbol   │ Union… │ Any  │ Union… │ Any │ Union…  │ Nothing  │ DataType │
├─────┼──────────┼────────┼──────┼────────┼─────┼─────────┼──────────┼──────────┤
│ 1   │ Names    │        │ John │        │ May │ 2       │          │ String   │
│ 2   │ Age      │ 27.5   │ 27   │ 27.5   │ 28  │         │          │ Int64    │

另外有個獨立的 CSV 套件,可處理 csv file,讀入 CSV file 後,就得到 DataFrame 物件

using Pkg
Pkg.add("CSV")
using CSV
julia> CSV.read("sample.csv")
3×2 DataFrame
│ Row │ Lora    │ UK      │
│     │ String⍰ │ String⍰ │
├─────┼─────────┼─────────┤
│ 1   │ James   │ UK      │
│ 2   │ Raj     │ India   │
│ 3   │ May     │ America │

Linear algebra

產生 matrix

# 亂數 3x3 matrix
julia> A = rand(3,3)
3×3 Array{Float64,2}:
 0.108282  0.433033  0.247145
 0.571936  0.369386  0.547845
 0.423382  0.380503  0.77661

# array 且初始為 1
julia> ones(5)
5-element Array{Float64,1}:
 1.0
 1.0
 1.0
 1.0
 1.0

Vector 與 Matrix 的差異

Vector{T} 是 Array{T, 1} 的 alias,Matrix{T} 是Array{T, 2} 的 alias

julia> Vector{Float64} == Array{Float64,1}
true

julia> Matrix{Float64} == Array{Float64,2}
true

矩陣乘法

julia> A = rand(3,3)
3×3 Array{Float64,2}:
 0.167005  0.188954  0.0483164
 0.82603   0.440255  0.00107621
 0.137211  0.308508  0.724953

julia> b = 2*A
3×3 Array{Float64,2}:
 0.334011  0.377909  0.0966328
 1.65206   0.88051   0.00215241
 0.274422  0.617015  1.44991

julia> b= 2A
3×3 Array{Float64,2}:
 0.334011  0.377909  0.0966328
 1.65206   0.88051   0.00215241
 0.274422  0.617015  1.44991

transpose 轉置矩陣

julia> transpose_of_A = A'
3×3 LinearAlgebra.Adjoint{Float64,Array{Float64,2}}:
 0.167005   0.82603     0.137211
 0.188954   0.440255    0.308508
 0.0483164  0.00107621  0.724953

inverse 逆矩陣

julia> inv(A)
3×3 Array{Float64,2}:
 -6.31559   2.41816    0.417329
 11.8591   -2.26692   -0.787013
 -3.85134   0.507016   1.63533

Determinant 行列式

julia> using LinearAlgebra

julia> det(A)
-0.05048337143385562

Eigenvalues 特徵向量

julia> eigvals(A)
3-element Array{Float64,1}:
 -0.1016764648907107
  0.5846559789763167
  0.8492342814186749

方程式運算

julia> 5
5

julia> equation = 3x^2 + 4x + 3
98

Differential calculus

Pkg.add("Calculus")
using Calculus
names(Calculus)
julia> f(x) = sin(x)
f (generic function with 1 method)
        
julia> f'(1.0)
0.5403023058631036

julia> f'(1.0) - cos(1.0)
-5.036193684304635e-12

julia> f''(1.0) - (-sin(1.0))
-6.647716624952338e-7

julia> differentiate("cos(x) + sin(x) + exp(-x) * cos(x)", :x)
:(1 * -(sin(x)) + 1 * cos(x) + ((-1 * exp(-x)) * cos(x) + exp(-x) * (1 * -(sin(x)))))

julia> differentiate("sin(x)", :x)
:(1 * cos(x))

julia> differentiate("cos(x) + sin(y) + exp(-x) * cos(y)", [:x, :y])
2-element Array{Any,1}:
 :(1 * -(sin(x)) + ((-1 * exp(-x)) * cos(y) + exp(-x) * 0))
 :(1 * cos(y) + (0 * cos(y) + exp(-x) * (1 * -(sin(y)))))

Statistics

JuliaStats 所列出支援的 packages

  • DataFrames
  • Distributions: probability distribution
  • MultivariateStats: Multivariate statistical analysis
  • HypothesisTests
  • MLBase: Swiss knife for machine learning
  • Distances: Various distances between vectors
  • KernelDensity
  • Clustering: algorithm for data clustering
  • GLM: Generalized linear models
  • NMF: Nonnegative matrix factorization
  • RegERMs: Regularized empirical risk minimization
  • MCMC: Markov Chain Monte Carlo
  • TimeSeries: time series analysis

simple statistics
julia> using Statistics

julia> x = [10,20,30,40,50]

julia> mean(x)
30.0

# 中位數 median
julia> median(x)
30.0

julia> sum(x)
150

# 標準差
julia> std(x)
15.811388300841896

# variance 變異數
julia> var(x)
250.0

julia 也支援 cumulative operations 的 functions

  • accumulate
  • cumsum(): cumulative summation
  • cumprod(): cumulative product
julia> accumulate(+, x)
5-element Array{Int64,1}:
  10
  30
  60
 100
 150

julia> accumulate(-, x)
5-element Array{Int64,1}:
   10
  -10
  -40
  -80
 -130

julia> cumsum(x)
5-element Array{Int64,1}:
  10
  30
  60
 100
 150

julia> cumprod(x)
5-element Array{Int64,1}:
       10
      200
     6000
   240000
 12000000

julia> for i in [:cumsum,:cumprod]
               @eval print($i, "->")
               @eval println($i(x))
       end
cumsum->[10, 30, 60, 100, 150]
cumprod->[10, 200, 6000, 240000, 12000000]

statistics using DataFrames
julia> using DataFrames

julia> dframe = DataFrame(Subjects = ["Maths","Physics","Chemistry"],Marks = [90,85,95])
3×2 DataFrame
│ Row │ Subjects  │ Marks │
│     │ String    │ Int64 │
├─────┼───────────┼───────┤
│ 1   │ Maths     │ 90    │
│ 2   │ Physics   │ 85    │
│ 3   │ Chemistry │ 95    │

julia> describe(dframe)
2×8 DataFrame
│ Row │ variable │ mean   │ min       │ median │ max     │ nunique │ nmissing │ eltype   │
│     │ Symbol   │ Union… │ Any       │ Union… │ Any     │ Union…  │ Nothing  │ DataType │
├─────┼──────────┼────────┼───────────┼────────┼─────────┼─────────┼──────────┼──────────┤
│ 1   │ Subjects │        │ Chemistry │        │ Physics │ 3       │          │ String   │
│ 2   │ Marks    │ 90.0   │ 85        │ 90.0   │ 95      │         │          │ Int64    │

Pandas

對習慣使用 python 進行資料分析的使用者來說,Pandas 是比較常用的 package

julia> pandasDataframe = Pandas.DataFrame(Dict(:Subjects => ["Maths","Physics","Chemistry"],:Marks => [90,85,95]))
   Marks   Subjects
0     90      Maths
1     85    Physics
2     95  Chemistry


julia> Pandas.describe(pandasDataframe)
       Marks
count    3.0
mean    90.0
std      5.0
min     85.0
25%     87.5
50%     90.0
75%     92.5
max     95.0


julia> pandasDataframe[:Subjects]
0        Maths
1      Physics
2    Chemistry
Name: Subjects, dtype: object


julia> pandasDataframe[:Marks]
0    90
1    85
2    95
Name: Marks, dtype: int64


julia> query(pandasDataframe,:(Marks>90))
   Marks   Subjects
2     95  Chemistry

Distributions

distributions.jl 有以下功能

  • Moments (ex: mean, variance, skewness, and kurtosis), entropy, and other properties
  • Probability density/mass functions (pdf) and their logarithms (logpdf)
  • Moment generating functions and characteristic functions
  • Maximum likelihood estimation
  • Posterior w.r.t. conjugate prior and Maximum-A-Posteriori (MAP) estimation
using Pkg
Pkg.add("Distributions")
using Distributions

distribution = Normal()

# 以 normal distribution 產生 10 筆 random 資料
x = rand(distribution, 10)

Binomial()

Cauchy()

Poisson()

TimeSeries
Pkg.add("TimeSeries")
Pkg.add("MarketData")

using TimeSeries
julia> dates  = collect(Date(2017,8,1):Day(1):Date(2017,8,5))
5-element Array{Date,1}:
 2017-08-01
 2017-08-02
 2017-08-03
 2017-08-04
 2017-08-05

julia> sample_time = TimeArray(dates, rand(length(dates)))
5×1 TimeArray{Float64,1,Date,Array{Float64,1}} 2017-08-01 to 2017-08-05
│            │ A      │
├────────────┼────────┤
│ 2017-08-01 │ 0.9142 │
│ 2017-08-02 │ 0.0834 │
│ 2017-08-03 │ 0.417  │
│ 2017-08-04 │ 0.4778 │
│ 2017-08-05 │ 0.6859 │

Hypothesis testing

測試 sample data 是否有足夠的 evidence,針對整個 population of data 進行條件測試。

有兩種 hypothesis testing 測試方式

  • Null hypothesis: the statement being tested
  • Alternative hypothesis: the statement that you will be able to conclude as true
Pkg.add("HypothesisTests")
using HypothesisTests

using Distributions
julia> sampleOne = rand(Normal(), 10)
10-element Array{Float64,1}:
 -0.5347603532683008
 -0.7882658160278007
 -0.3314222340035204
 -0.9280782524368908
  0.4540819395751322
  1.056554282551302
  1.2211198338710754
 -0.7067184060685551
  0.9788402691232629
  0.39421862072760827

julia> testOne = OneSampleTTest(sampleOne)
One sample t-test
-----------------
Population details:
    parameter of interest:   Mean
    value under h_0:         0
    point estimate:          0.0815569884043313
    95% confidence interval: (-0.514, 0.6771)

Test summary:
    outcome with 95% confidence: fail to reject h_0
    two-sided p-value:           0.7638

Details:
    number of observations:   10
    t-statistic:              0.3097695342286937
    degrees of freedom:       9
    empirical standard error: 0.2632827937950805


julia> @which OneSampleTTest(sampleOne)
OneSampleTTest(v::AbstractArray{T,1}) where T<:Real in HypothesisTests at /Users/charley/.julia/packages/HypothesisTests/M3Ysg/src/t.jl:98

julia> pvalue(testOne)
0.7637885831202397

julia> pvalue(testOne, tail=:right)
0.38189429156011984

julia> pvalue(testOne, tail=:left)
0.6181057084398802

# check whether 25 successes from 1000 samples is inconsistent with a 50% success rate
julia> BinomialTest(25, 1000, 0.50)
Binomial test
-------------
Population details:
    parameter of interest:   Probability of success
    value under h_0:         0.5
    point estimate:          0.025
    95% confidence interval: (0.0162, 0.0367)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-99

Details:
    number of observations: 1000
    number of successes:    25

Optimization

optimization 是 finding best solution from all feasible solutions 的問題,可分為兩類

  • continuous optimization problem
  • combinatorial optimization problem (discrete)

某些問題可以用 optimization 方法解決

  • Shortest path
  • Maximum flow through a network
  • Vehicle routing

juila 將 optimization packages 集合在 JuliaOpt,其中有兩個最重要的 packages,這兩個都是 AML(Algebraic modeling languages),放在 MathProgBase.jl 裡面

  • JuMP (Julia for Mathematical Programming)
  • Convex.jl

JuMP

Python 習慣使用 PuLP

Pkg.add("JuMP")
Pkg.add("Clp")
using JuMP
using Clp
julia> m = JuMP.Model()
Feasibility problem with:
 * 0 linear constraints
 * 0 variables
Solver is default solver

julia> m = JuMP.Model(solver = Clp.ClpSolver())
Feasibility problem with:
 * 0 linear constraints
 * 0 variables
Solver is Clp

optimiser.jl

using JuMP
using Clp

m = Model(solver = ClpSolver())
@variable(m, 0 <= a <= 2 )
@variable(m, 0 <= b <= 10 )

@objective(m, Max, 5a + 3*b )
@constraint(m, 1a + 5b <= 3.0 )

print(m)

status = solve(m)

println("Objective value: ", getobjectivevalue(m))
println("a = ", getvalue(a))
println("b = ", getvalue(b))

執行

$ julia optimiser.jl
Max 5 a + 3 b
Subject to
 a + 5 b ≤ 3
 0 ≤ a ≤ 2
 0 ≤ b ≤ 10
Objective value: 10.6
a = 2.0
b = 0.2
Convex.jl

用在 disciplined convex programming,使用 Convex 需要 solver,也就是 SCS

Pkg.add("Convex")
Pkg.add("SCS")

using Convex
using SCS

X = Variable(2, 2)

y = Variable()

p = minimize(vecnorm(X) + y, 2 * X <= 1, X' + y >= 1, X >= 0, y >= 0)

solve!(p)

println(X.value)

println(y.value)

p.optval

References

Learning Julia

2019年6月3日

Interoperability and Meta Programming

julia 可跟 C, Python 的程式互動。另外 juila 支援 meta programming 的形式。

Interacting with OS

在 REPL 可用 ; 進入 shell,執行 shell commands,如果在 julia,則是直接用 pwd()

shell> pwd
/Users/user

julia> pwd()
"/Users/user"

filesystem operations

存取檔案的 functions

  • homedir(), joinpath()

    為支援 linux, windows,使用 joinpath() 產生檔案的路徑

julia> homedir()
"/Users/user"

julia> joinpath(homedir(), "Documents")
"/Users/user/Documents"
  • pwd(), cd()

    取得目前的目錄,以及切換目錄

    cd("../")
  • readdir()

    取得目錄內的檔案列表,包含目錄及檔案

    readdir(".")
  • mkdir()

    d = joinpath(homedir(), "LearningJulia")
    mkdir(d)
  • stat

    stat("somefile") 會取得 StatStruct(mode=0o100644, size=28676),而 StatStruct 裡面有這些欄位

    ref: https://github.com/JuliaLang/julia/blob/master/base/stat.jl

    ex: stat("somefile").size

    Name Description
    size The size (in bytes) of the file
    device ID of the device that contains the file
    inode The inode number of the file
    mode The protection mode of the file
    nlink The number of hard links to the file
    uid The user id of the owner of the file
    gid The group id of the file owner
    rdev If this file refers to a device, the ID of the device it refers to
    blksize The file-system preferred block size for the file
    blocks The number of such blocks allocated
    mtime Unix timestamp of when the file was last modified
    ctime Unix timestamp of when the file was created
    • cp, mv
    julia> cp("test", "test1")
    "test1"
    
    julia> isfile("test1")
    true
    
    julia> mv("test1", "test2")
    "test2"
  • isdir, homedir, basename, dirname, and splitdir

    julia> homedir()
    "/Users/user"
    
    julia> joinpath(homedir(), "Documents")
    "/Users/user/Documents"
    
    julia> isdir(joinpath(homedir(), "Documents"))
    true
    
    julia> basename(homedir())
    "user"
    
    julia> dirname(homedir())
    "/Users"
    
    julia> splitdir(homedir())
    ("/Users", "user")
    
  • mkpath, ispath, abspath, and joinpath

    julia> ispath(homedir())
    true
    
    julia> abspath(".")
    "/Users/user/Downloads/"
    
    julia> joinpath(homedir(), "Documents")
    "/Users/user/Documents"
    
    julia> mkpath(joinpath("Users","adm"))
    "Users/adm"
    
    julia> for (root, dirs, files) in walkdir("Users")
                   println("Directories in $root")
           end
    Directories in Users
    Directories in Users/adm

I/O Operations

STDOUT, STDERR, STDIN 是三個 global variables,分別為 standard output, error, input stram

open(), close(), write, read

julia> file = open("sample.txt")
IOStream(<file sample.txt>)

julia> file
IOStream(<file sample.txt>)

julia> lines = readlines(file)
2-element Array{String,1}:
 "Hi there!"
 "Learning Julia is so much fun."

julia> write("sample.txt", "hi how are you doing?")
21

julia> read("sample.txt")
21-element Array{UInt8,1}:

julia> readline("sample.txt")
"hi how are you doing?"

julia> close(file)

讀取某個文字檔,計算裡面有幾個 "Julia" 這個 word

# Arguments
# ARGS[0] is the command itself
in_file = ARGS[1]
out_file = ARGS[2]

# Keeping track using a counter
counter = 0
for line in eachline(in_file)
    for word in split(line)
        if word == "Julia"
            global counter += 1
        end
    end
end

# Write the contents to the o/p file
write(out_file, "The count for the word julia is $counter")

# Finally, read the contents to inform the user
for line in readlines(out_file)
    println(line)
end
shell> julia sample.jl sample.txt out.txt
The count for the word julia is 4

Calling C and Python

Calling C from Julia

compile (ex: LLVM) 需要知道什麼,才能由 Julia 呼叫 C

  • Name of the library
  • Name of the function
  • Number and types of the arguments (也稱為 Arity)
  • return type of the function
  • values of the arguments passed

julia 使用 ccall((:name,"lib"), return_type, (arg1_type, arg2_type...), arg1, arg2) 處理

呼叫 C 標準函式庫的 clock function

julia> ccall((:clock, :libc), Int64, ())
3206897

julia> ccall((:clock, :libc), Cint, ())
3213482

julia> Int32 == Cint
true

Cint 是 C 的資料型別,等於 signed int c-type,另外還有 Cint, Cuint, Clong, Culong, and Cchar


呼叫 getenv,取得 SHELL value,Ptr{Cchar} 是參數的資料型別,後面 return value 資料型別也是 Ptr{Cchar}

julia> syspath = ccall( (:getenv, :libc), Ptr{Cchar}, (Ptr{Cchar},), "SHELL")
Ptr{Int8} @0x00007ffee37b857a

julia> unsafe_string(syspath)
"/bin/bash"

C 的 struct 可以替換為 julia 的 composite types

Calling Python form Julia

使用 PyCall

julia> using Pkg

julia> Pkg.add("PyCall")

julia> using PyCall

julia> py"len('julia')"
5

julia> py"str(5)"
"5"

julia> py"""
       print('hello world')
       """
hello world

如果要使用 python 的 built-in data type (ex: dict),可使用 pybuiltin

julia> pybuiltin(:dict)(a=1,b=2)
Dict{Any,Any} with 2 entries:
  "b" => 2
  "a" => 1

julia> d = pycall(pybuiltin("dict"), Any, a=1, b=2)
PyObject {'b': 2, 'a': 1}

julia> typeof(d)
PyObject

# 利用 PyDict 轉換為 julia dict object
julia> julia_dictionary = PyDict{Symbol, Int64}(pycall(pybuiltin("dict"), Any, a=1, b=2))
PyDict{Symbol,Int64,true} with 2 entries:
  :b => 2
  :a => 1

julia> typeof(julia_dictionary)
PyDict{Symbol,Int64,true}

julia> julia_dictionary[:a]
1

# 產生一個 kw 參數的 function
julia> f(; a=0, b=0) = [10a, b]
f (generic function with 1 method)

# 以 dict 呼叫 function
julia> f(;julia_dictionary...)
2-element Array{Int64,1}:
 10
  2

Expressions

這是 metaprogramming 的功能,這是參考 LISP 的功能

首先說明 julia 如何 interprets a code

julia> code = "println(\"hello world \")"
"println(\"hello world \")"

julia> expression = Meta.parse(code)
:(println("hello world "))

julia> typeof(expression)
Expr
julia> expression.args
2-element Array{Any,1}:
 :println
 "hello world "

julia> expression.head
:call

julia> dump(expression)
Expr
  head: Symbol call
  args: Array{Any}((2,))
    1: Symbol println
    2: String "hello world "

產生 expression,用 eval 運算, :call 是 symbol,表示 head of the expression

julia> sample_expr = Expr(:call, +, 10, 20)
:((+)(10, 20))

julia> eval(sample_expr)
30

julia> sample_expr.args
3-element Array{Any,1}:
   +
 10
 20

將參數替換為 :x, :y 變數,也可以直接寫 x, y

julia> x = 10; y =10
10

julia> sample_expr = Expr(:call, +, :x, :y)
:((+)(x, y))

julia> sample_expr = Expr(:call, +, x, y)
:((+)(10, 10))

julia> eval(sample_expr)
20

也可以使用 $

julia> x = 10; y=10
10

julia> e = :($x + $y)
:(10 + 10)

julia> eval(e)
20

所以有 : 跟 $ 兩種方式可產生 Expr object

  • Using quotes(:) at runtime
  • Using dollar($) ar parse time

另外還有一個方式,是使用 quote 這個 keyword。quote 跟 : 的差別,是用 quote 可讓程式碼排列更好,比較容易實作

julia> quote
          30 * 100
       end
quote
    #= REPL[88]:2 =#
    30 * 100
end

julia> eval(ans)
3000

julia> :(30 * 100)
:(30 * 100)

julia> eval(ans)
3000

Macros

類似 function,但 function 使用 variables 為參數,而 macros 使用 expressions,並回傳 modified expressions,呼叫 macro 是用 @

macro NAME
    # some custom code
    # return modified expression
end

因為 REPL 建立的 macro 是在 Main module 裡面

julia> macro HELLO(name)
         :( println("Hello! ", $name))
       end
@HELLO (macro with 1 method)

julia> @HELLO("world")
Hello! world

julia> macroexpand(Main, :(@HELLO("world")))
:((Main.println)("Hello! ", "world"))

why metaprogramming?

metaprogramming 可省略很多重複的程式碼

for sym in [:foo, :bar, :baz]
    @eval function $(Symbol(sym))(n::Int64)
        for i in 1:n
            println( $(string(sym)) )
        end
    end
end

可這樣使用

foo(1)
bar(2)
baz(3)

for header in [:h1, :h2, :h3, :h4, :h5, :h6]
     @eval function $(Symbol(header))(text::String)
         println("<" * $(string(header))* ">"  * " $text " * "</" * $(string(header))* ">")
     end
 end

h1("Hello world!")
# <h1> Hello world! </h1>

h3("Hello world!")
# <h3> Hello world! </h3>

Built-in macros

以下是所有內建的 macros

julia> @
@MIME_str      @boundscheck    @edit           @html_str       @nospecialize   @text_str
@__DIR__       @cfunction      @elapsed        @inbounds       @polly          @threadcall
@__FILE__      @cmd            @enum           @info           @r_str          @time
@__LINE__      @code_llvm      @error          @inline         @raw_str        @timed
@__MODULE__    @code_lowered   @eval           @int128_str     @s_str          @timev
@__dot__       @code_native    @evalpoly       @isdefined      @show           @uint128_str
@allocated     @code_typed     @fastmath       @label          @simd           @v_str
@assert        @code_warntype  @functionloc    @less           @specialize     @view
@async         @debug          @generated      @macroexpand    @static         @views
@b_str         @deprecate      @gensym         @macroexpand1   @sync           @warn
@big_str       @doc            @goto           @noinline       @task           @which
  • @time

    可取得執行某一段程式的耗費時間

    julia> function recursive_sum(n)
              if n == 0
                  return 0
              else
                  return n + recursive_sum(n-1)
              end
           end
    recursive_sum (generic function with 1 method)
    
    julia> @time recursive_sum(10000)
      0.005359 seconds (3.22 k allocations: 183.158 KiB)
    50005000
  • @elapsed

    類似 @time,但只有回傳時間 in Float64

    julia> @elapsed recursive_sum(10000)
    3.0984e-5
    
    julia> @elapsed recursive_sum(10000)
    2.4597e-5
  • @show

    會回傳 expression

    julia> @show(println("hello world"))
    hello world
    println("hello world") = nothing
    
    julia> @show(:(println("hello world")))
    $(Expr(:quote, :(println("hello world")))) = :(println("hello world"))
    :(println("hello world"))
    
    julia> @show(:(3*2))
    $(Expr(:quote, :(3 * 2))) = :(3 * 2)
    :(3 * 2)
    
    julia> @show(3*2)
    3 * 2 = 6
    6
    
    julia> @show(Int64)
    Int64 = Int64
    Int64
  • @which

    如果有某個 function 有多個 methods,也就是 multiple dispatch 的功能,想知道會呼叫哪一個 method

    julia> function tripple(n::Int64)
              3n
           end
    tripple (generic function with 1 method)
    
    julia> function tripple(n::Float64)
              3n
           end
    tripple (generic function with 2 methods)
    
    julia> methods(tripple)
    # 2 methods for generic function "tripple":
    [1] tripple(n::Float64) in Main at REPL[11]:2
    [2] tripple(n::Int64) in Main at REPL[10]:2
    
    julia> @which tripple(10)
    tripple(n::Int64) in Main at REPL[10]:2
    
    julia> @which tripple(10.0)
    tripple(n::Float64) in Main at REPL[11]:2
  • @task

    類似 coroutine,用來產生一個 task,而不是直接執行

    julia> say_hello() = println("hello world")
    say_hello (generic function with 1 method)
    
    julia> say_hello_task = @task say_hello()
    Task (runnable) @0x0000000113ed9210
    
    julia> istaskstarted(say_hello_task)
    false
    
    julia> schedule(say_hello_task)
    hello world
    Task (queued) @0x0000000113ed9210
    
    julia> yield()
    
    julia> istaskdone(say_hello_task)
    true
  • @codellvm, @codelowered, @codetyped,@codenative, and @code_warntype

    瞭解 code 在 julia 中的所有形式

    function fibonacci(n::Int64)
       if n < 2
           n
       else 
           fibonacci(n-1) + fibonacci(n-2)
       end
    end
    
    fibonacci(n::Int64) = n < 2 ? n : fibonacci(n-1) + fibonacci(n-2)
    
    # 轉換為 single static assignment,每個變數只會被 assigned 一次,在使用變數前都會先定義
    @code_lowered fibonacci(10)
    
    # 
    @code_typed fibonacci(10)
    
    @code_warntype fibonacci(10)
    
    # 使用 LLVM C++ API 產生 LLVM intermediate representation
    @code_llvm fibonacci(10)
    
    # binary code in memory
    @code_native fibonacci(10)
    

Type introspection

首先定義新的 type: Student,產生兩個物件

struct Student
   name::String
   age::Int64
end

alpha = Student("alpha",24)

beta = Student("beta",25)

可檢查物件的資料型別,或是使用 isa 判斷是否為某個 function 產生的 type

julia> typeof(alpha)
Student

julia> isa(alpha, Student)
true

julia> alpha isa Student
true

reflection

可在 runtime 查詢物件的 attributes。在 julia 產生 function 後,就可以查詢 function 有幾個參數,有哪些 methods。

function calculate_quad(a::Int64,b::Int64,c::Int64,x::Int64)
   return a*x^2 + b*x + c
end

calculate_quad(1,2,3,4)

function calculate_quad(a::Int64,b::Int64,c::Int64,x::Float64)
   return a*x^2 + b*x + c
end

calculate_quad(1,2,3,4.75)
julia> methods(calculate_quad)
# 2 methods for generic function "calculate_quad":
[1] calculate_quad(a::Int64, b::Int64, c::Int64, x::Float64) in Main at REPL[38]:2
[2] calculate_quad(a::Int64, b::Int64, c::Int64, x::Int64) in Main at REPL[36]:2
julia> fieldnames(Student)
(:name, :age)

julia> Student.types
svec(String, Int64)

julia> typeof(Student.types)
Core.SimpleVector

References

Learning Julia