2018年2月12日

Apache HTTP Client 4.5 Fundamentals

Apache HTTP Client 是目前最常被使用的 Java HTTP Client Library,他經歷了許多次改版,目前的正式版為 4.5,5.0 還在測試階段。HttpClient 沒有完整的瀏覽器的功能,最大的差異是缺少了 UI,他單純地只有提供 HTTP Protocol 1.0 及 1.1 的資料傳輸及互動的功能,通常用在 Server Side,需要對其他有 HTTP 介面的 Server 進行資料傳輸互動時,例如在 Server 對 Google 發送搜尋的 reqest,並對 Google 回應的 HTML 內容進行解析。

除了 HTTP Request 及 Response 的 Messages 以外,所有 HTTP Request 都要以某一個 HTTP Method 的形式發送給 Server,最常用的是 GET 及 POST,在 HTTP Message 中會有多個 Headers 描述 metadatas,而在 Response 中,會夾帶可儲存在 Client 的 Cookie 資料,並在下一次的 Request 回傳給 Server,Session 是指一連串的多個 Http Request 及 Response 的互動過程,通常會以 Cookie 的方式紀錄 Session ID。

HTTP Fundamentals

這是最基本的 HttpGet

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("http://localhost/");
CloseableHttpResponse response = null;

try {
    response = httpclient.execute(httpget);
} catch (IOException e) {
    e.printStackTrace();

    logger.error("error: ", e);
} finally {
    try {
        response.close();
    } catch (IOException e) {
        e.printStackTrace();

        logger.error("error: ", e);
    }
}

Http methods 有 GET, HEAD, POST, PUT, DELETE, TRACE and OPTIONS,針對每一種 method 都有提供專屬的 class: HttpGet, HttpHead, HttpPost, HttpPut, HttpDelete, HttpTrace, and HttpOptions。

Request URI 是 Uniform Resource Identifier,可識別資源的位置,HTTP Request URI 包含了 protocol scheme, host name, optional port, resource path, optional query, and optional fragment 這幾個部分,可用 URIBuilder 產生 URI。

// uri=http://www.google.com/search?q=httpclient&btnG=Google+Search&aq=f&oq=
URI uri = new URIBuilder()
                    .setScheme("http")
                    .setHost("www.google.com")
                    .setPath("/search")
                    .setParameter("q", "httpclient")
                    .setParameter("btnG", "Google Search")
                    .setParameter("aq", "f")
                    .setParameter("oq", "")
                    .build();
HttpGet httpget = new HttpGet(uri);

HTTP response

HTTP response 是 server 回傳給 client 的 message。

HttpResponse httpResponse = new BasicHttpResponse(HttpVersion.HTTP_1_1,
        HttpStatus.SC_OK, "OK");
System.out.println(httpResponse.getProtocolVersion());
System.out.println(httpResponse.getStatusLine().getStatusCode());
System.out.println(httpResponse.getStatusLine().getReasonPhrase());
System.out.println(httpResponse.getStatusLine().toString());
            


HttpResponse httpResponse2 = new BasicHttpResponse(HttpVersion.HTTP_1_1,
        HttpStatus.SC_OK, "OK");
httpResponse2.addHeader("Set-Cookie",
        "c1=a; path=/; domain=localhost");
httpResponse2.addHeader("Set-Cookie",
        "c2=b; path=\"/\", c3=c; domain=\"localhost\"");
Header h1 = httpResponse2.getFirstHeader("Set-Cookie");
System.out.println(h1);
Header h2 = httpResponse2.getLastHeader("Set-Cookie");
System.out.println(h2);
Header[] hs = httpResponse2.getHeaders("Set-Cookie");
System.out.println(hs.length);

輸出結果

HTTP/1.1
200
OK
HTTP/1.1 200 OK


Set-Cookie: c1=a; path=/; domain=localhost
Set-Cookie: c2=b; path="/", c3=c; domain="localhost"
2

http message 中包含了許多 headers,可利用 HeaderIterator 逐項處理每一個 header,另外有一個 BasicHeaderElementIterator 可以針對某一種 header,處理所有 header elements。

HttpResponse response = new BasicHttpResponse(HttpVersion.HTTP_1_1,
        HttpStatus.SC_OK, "OK");
response.addHeader("Set-Cookie",
        "c1=a; path=/; domain=localhost");
response.addHeader("Set-Cookie",
        "c2=b; path=\"/\", c3=c; domain=\"localhost\"");

// HeaderIterator
HeaderIterator it = response.headerIterator("Set-Cookie");
while (it.hasNext()) {
    System.out.println(it.next());
}

// HeaderElementIterator
HeaderElementIterator it2 = new BasicHeaderElementIterator(
        response.headerIterator("Set-Cookie"));
while (it2.hasNext()) {
    HeaderElement elem = it2.nextElement();
    System.out.println(elem.getName() + " = " + elem.getValue());
    NameValuePair[] params = elem.getParameters();
    for (int i = 0; i < params.length; i++) {
        System.out.println(" " + params[i]);
    }
}

HTTP entity

HTTP message 能封裝某個 request/response 的某些 content,可在某些 request/response 中找到,他是 optional 的資料。使用 entities 的 request 稱為 entity enclosing requests,HTTP request 中有兩種 entity request methods: POST 及 PUT。

除了回應 HEAD method 的 response 以及 204 No Content, 304 Not Modified, 205 Reset Content 以外,大部分的 response 較常使用 entity。

HttpClient 區分了三種 entities: streamed, self-contained, wrapping,通常會將 non-repeatable entities 視為 streamed,而將 repeatable entities 視為 self-contained。

  1. streamed: content 是由 stream 取得,常用在 response,streamed entities 不能重複。

  2. self-contained: content 存放在記憶體中,或是由 connection 以外的方式取得的,這種 entity 可以重複,通常用在 entity enclosing HTTP requests。repeatable 就是可以重複讀取 content 的 entity,ex: ByteArrayEntity or StringEntity。

  3. wrapping: 由另一個 entity 取得的 content

因 entity 可存放 binary 及 character content,因此支援了 character encodings。

可利用 HttpEntity#getContentType(), HttpEntity#getContentLength() 取得 Content-Type and Content-Length 欄位的資訊,因 Content-Type 包含了 character encoding 的資訊,可用 HttpEntity#getContentEncoding() 取得,如果 HttpEntity 包含了 Content-Type header,就能取得 Header 物件。

StringEntity myEntity = new StringEntity("important message",
        ContentType.create("text/plain", "UTF-8"));
    
System.out.println(myEntity.getContentType());
System.out.println(myEntity.getContentLength());
System.out.println(EntityUtils.toString(myEntity));
System.out.println(EntityUtils.toByteArray(myEntity).length);

結果

Content-Type: text/plain; charset=UTF-8
17
important message
17

Ensuring release of low level resources

為確保系統資源有回收,必須要關閉 entity 取得的 content stream,或是直接關閉 response,關閉 stream 時,還能保持 connection,但如果關閉 response,就直接關閉並 discards connection。

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("http://localhost/");
CloseableHttpResponse response = httpclient.execute(httpget);
try {
    HttpEntity entity = response.getEntity();
    if (entity != null) {
        InputStream instream = entity.getContent();
        try {
            // do something useful
        } finally {
            instream.close();
        }
    }
} finally {
    response.close();
}

HttpEntity#writeTo(OutputStream) 也能用來保證在 entity 完全寫入後, resource 能被釋放。如果是用 HttpEntity#getContent() 取得了 java.io.InputStream,就必須自行在 finally 中 close stream。如果是處理 streaming entities,使用 EntityUtils#consume(HttpEntity) 可保證 entity content 能完全被處理並回收 stream。
 如果只需要處理部分 response content,可直接呼叫 response.close,就不需要消化所有的 response content,但 connection 也無法被 reused。

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("http://localhost/");
CloseableHttpResponse response = httpclient.execute(httpget);
try {
    HttpEntity entity = response.getEntity();
    if (entity != null) {
        InputStream instream = entity.getContent();
        int byteOne = instream.read();
        int byteTwo = instream.read();
        
        // Do not need the rest
    }
} finally {
    response.close();
}

Consuming entity content

最好的方式是呼叫 HttpEntity#getContent() 或是 HttpEntity#wrtieTo(OutputStream),但 HttpClient 同時也提供 EntityUtils 類別有多個處理 content 的 static methods,不建議使用 EntityUtils,除非 response 是由 trusted HTTP server 回傳,且是有限的長度。

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("http://localhost/");
CloseableHttpResponse response = httpclient.execute(httpget);
try {
    HttpEntity entity = response.getEntity();
    if (entity != null) {
        long len = entity.getContentLength();
        if (len != -1 && len < 2048) {
            System.out.println(EntityUtils.toString(entity));
        } else {
            // Stream content out
        }
    }
} finally {
    response.close();
}

如果需要多次讀取整個 entity content,最簡單的方式是以 BufferedHttpEntity class 封裝原本的 entity,這可讓 content 放如 in-memory buffer。

CloseableHttpResponse response = <...>
HttpEntity entity = response.getEntity();
if (entity != null) {
    entity = new BufferedHttpEntity(entity);
}

Producing entity content

StringEntity, ByteArrayEntity, InputSreamEntity, FileEntity 可用來透過 HTTP connetion stream out 資料。InputStreamEntity 只能被使用一次,不能重複讀取資料。

File file = new File("somefile.txt");
FileEntity entity = new FileEntity(file,
        ContentType.create("text/plain", "UTF-8"));
HttpPost httppost = new HttpPost("http://localhost/action.do");
httppost.setEntity(entity);

HTML forms

UrlEncodedFormEntity 模擬 submitting an HTML form。以下等同用 POST method 發送 param1=value1&param2=value2。

List<NameValuePair> formparams = new ArrayList<NameValuePair>();
formparams.add(new BasicNameValuePair("param1", "value1"));
formparams.add(new BasicNameValuePair("param2", "value2"));
UrlEncodedFormEntity entity = new UrlEncodedFormEntity(formparams, Consts.UTF_8);

HttpPost httppost = new HttpPost("http://localhost/handler.do");
httppost.setEntity(entity);

Content chunking

可直接呼叫 HttpEntity#setChunked(true) 建議分塊處理 content,但如果遇到不支援的 HTTP/1.0,還是會忽略這個設定值。

StringEntity entity = new StringEntity("important message",
ContentType.create("plain/text", Consts.UTF_8));
entity.setChunked(true);
HttpPost httppost = new HttpPost("http://localhost/acrtion.do");
httppost.setEntity(entity);

response handlers

透過 ResponseHandler interface 的 handleResponse(HttpResponse response) 這個方法處理 response 這個方式最簡單,programmer 不需要處理 connection management,HttpClient 會自動確保 connection 回到 connection manager。

public static void main(String[] args) {
    try {
        CloseableHttpClient httpClient = HttpClients.createDefault();
        HttpGet httpget = new HttpGet("http://localhost/json");

        ResponseHandler<MyJsonObject> rh = new ResponseHandler<MyJsonObject>() {
            public MyJsonObject handleResponse(final HttpResponse response) throws IOException {
                StatusLine statusLine = response.getStatusLine();
                HttpEntity entity = response.getEntity();
                if (statusLine.getStatusCode() >= 300) {
                    throw new HttpResponseException(statusLine.getStatusCode(), statusLine.getReasonPhrase());
                }
                if (entity == null) {
                    throw new ClientProtocolException("Response contains no content");
                }
                Gson gson = new GsonBuilder().create();
                Reader reader = new InputStreamReader(entity.getContent(), ContentType.getOrDefault(entity)
                        .getCharset());
                return gson.fromJson(reader, MyJsonObject.class);
            }
        };
        MyJsonObject myjson = httpClient.execute(httpget, rh);
        System.out.println(myjson.toString());

    } catch (Exception e) {
        e.printStackTrace();
    }
}

public class MyJsonObject {

}

HttpClient interface

HttpClient 是 thread safe,包含了多個 handler 及 strategy interface implementations,可以自訂 HttpClient。

ConnectionKeepAliveStrategy keepAliveStrat = new DefaultConnectionKeepAliveStrategy() {
    @Override
    public long getKeepAliveDuration(
            HttpResponse response,
            HttpContext context) {
        long keepAlive = super.getKeepAliveDuration(response, context);
        if (keepAlive == -1) {
            // Keep connections alive 5 seconds if a keep-alive value
            // has not be explicitly set by the server
            keepAlive = 5000;
        }
        return keepAlive;
    }
};
CloseableHttpClient httpclient = HttpClients.custom()
        .setKeepAliveStrategy(keepAliveStrat)
        .build();

如果 CloseableHttpClient 已經不需要使用了,且不要再被 connection manager 管理,就必須要呼叫 CloseableHttpClient#close()

CloseableHttpClient httpclient = HttpClients.createDefault();
try {
    <...>
} finally {
    httpclient.close();
}

HTTP execution context

HTTP 是 stateless, response-request protocol,但實際上 application 需要在數個 request-response 之間保存 state information。HTTP context functions 類似 java.util.Map 的概念,
 HttpClient 4.x 可以維持 HTTP session,只要使用同一個 HttpClient 且未關閉連接,則可以使用相同會話來訪問其他要求登錄驗證的服務。

如果需要使用 HttpClient Pool,並且想要做到一次登錄的會話供多個HttpClient連接使用,就需要自己保存 session information。因為客戶端的會話信息是保存在cookie中的(JSESSIONID),所以只需要將登錄成功返回的 cookie 複製到各個HttpClient 使用即可。

使用 Cookie 的方法有3種,可使用同一個 HttpClient,可以自己使用CookieStore來保存,也可以通過HttpClientContext上下文來維持。

  • 使用同一個 CloseableHttpClient
public class TestHttpClient {

    public static void main(String[] args) {
        TestHttpClient test = new TestHttpClient();

        try {
            test.testTheSameHttpClient();

        } catch (Exception e) {
            e.printStackTrace();
        }

    }

    String loginUrl = "http://192.168.1.24/admin/config.php";
    String testUrl = "http://192.168.1.24/admin/ajax.php?module=core&command=getExtensionGrid";

    public void testTheSameHttpClient() throws Exception {
        System.out.println("----testTheSameHttpClient");

        //// 由 HttpClientBuilder 產生 CloseableHttpClient
        // HttpClientBuilder httpClientBuilder = HttpClientBuilder.create();
        // CloseableHttpClient client = httpClientBuilder.build();

        //// 直接產生 CloseableHttpClient
        CloseableHttpClient client = HttpClients.createDefault();

        HttpPost httpPost = new HttpPost(loginUrl);
        Map parameterMap = new HashMap();
        parameterMap.put("username", "admin");
        parameterMap.put("password", "password");

        UrlEncodedFormEntity postEntity = new UrlEncodedFormEntity(
                getParam(parameterMap), "UTF-8");
        httpPost.setEntity(postEntity);

        System.out.println("request line:" + httpPost.getRequestLine());
        try {
            // 執行post請求
            CloseableHttpResponse httpResponse = client.execute(httpPost);

            boolean loginFailedFlag = false;
            try {
                String responseString = printResponse(httpResponse);

                loginFailedFlag = responseString.contains("Please correct the following errors");

            } finally {
                httpResponse.close();
            }
            System.out.println("loginFailedFlag?:" + loginFailedFlag);

            if( !loginFailedFlag ) {
                // 執行get請求
                System.out.println("----the same client");
                HttpGet httpGet = new HttpGet(testUrl);
                System.out.println("request line:" + httpGet.getRequestLine());
                CloseableHttpResponse httpResponse1 = client.execute(httpGet);

                try {
                    printResponse(httpResponse1);
                } finally {
                    httpResponse1.close();
                }
            }

        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                // close HttpClient and release all system resources
                client.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

    private static String printResponse(HttpResponse httpResponse)
            throws ParseException, IOException {
        HttpEntity entity = httpResponse.getEntity();
        // response status code
        System.out.println("status:" + httpResponse.getStatusLine());
        System.out.println("headers:");
        HeaderIterator iterator = httpResponse.headerIterator();
        while (iterator.hasNext()) {
            System.out.println("\t" + iterator.next());
        }
        // 判斷 response entity 是否 null
        String responseString = null;
        if (entity != null) {
            responseString = EntityUtils.toString(entity);
            System.out.println("response length:" + responseString.length());
            System.out.println("response content:"
                    + responseString.replace("\r\n", ""));
        }

        return responseString;
    }

    private static List<NameValuePair> getParam(Map parameterMap) {
        List<NameValuePair> param = new ArrayList<NameValuePair>();
        Iterator it = parameterMap.entrySet().iterator();
        while (it.hasNext()) {
            Entry parmEntry = (Entry) it.next();
            param.add(new BasicNameValuePair((String) parmEntry.getKey(),
                    (String) parmEntry.getValue()));
        }
        return param;
    }
}
  • 使用 HttpContext

HttpContext 能夠保存任意的物件,因此在兩個不同的 thread 中共享上下文是不安全的,建議每個線程都一個它自己執行的context。

在執行 HTTP request 時,HttpClient 會將以下屬性放到 context 中

  1. HttpConnection instance: 代表連接到目標服務器的當前 connection。
  2. HttpHost instance: 代表當前 connection連接到的目標 server
  3. HttpRoute instance: 完整的連線路由
  4. HttpRequest instance: 代表了當前的HTTP request。HttpRequest object 在 context 中總是準確代表了狀態信息,因為它已經發送給了服務器。 預設的HTTP/1.0 和 HTTP/1.1使用相對的請求URI,但以non-tunneling模式通過代理發送 request 時,URI會是絕對的。
  5. HttpResponse instance: 代表當前的 HTTP response。
  6. java.lang.Boolean object 是一個標識,它標誌著當前請求是否完整地傳輸給連接目標。
  7. RequestConfig object: 代表當前請求配置
  8. java.util.List object: 代表一個含有執行請求過程中所有的重定向地址。
public class TestHttpContext {

    public static void main(String[] args) {
        TestHttpContext test = new TestHttpContext();

        try {
            test.testHttpContext();

        } catch (Exception e) {
            e.printStackTrace();
        }

    }

    String loginUrl = "http://192.168.1.24/admin/config.php";
    String testUrl = "http://192.168.1.24/admin/ajax.php?module=core&command=getExtensionGrid&type=all&order=asc";

    public void testHttpContext() throws Exception {
        System.out.println("----testHttpContext");

        //// 由 HttpClientBuilder 產生 CloseableHttpClient
        // HttpClientBuilder httpClientBuilder = HttpClientBuilder.create();
        // CloseableHttpClient client = httpClientBuilder.build();

        //// 直接產生 CloseableHttpClient
        CloseableHttpClient client = HttpClients.createDefault();

        // Create a local instance of cookie store
        CookieStore cookieStore = new BasicCookieStore();

        // Create local HTTP context
        HttpClientContext localContext = HttpClientContext.create();
        localContext.setCookieStore(cookieStore);


        HttpPost httpPost = new HttpPost(loginUrl);
        Map parameterMap = new HashMap();
        parameterMap.put("username", "admin");
        parameterMap.put("password", "max168kit");

        UrlEncodedFormEntity postEntity = new UrlEncodedFormEntity(
                getParam(parameterMap), "UTF-8");
        httpPost.setEntity(postEntity);

        System.out.println("request line:" + httpPost.getRequestLine());
        try {

            CloseableHttpResponse httpResponse = client.execute(httpPost, localContext);

            boolean loginFailedFlag = false;
            try {
                String responseString = printResponse(httpResponse, cookieStore);

                loginFailedFlag = responseString.contains("Please correct the following errors");

            } finally {
                httpResponse.close();
            }

            System.out.println("loginFailedFlag?:" + loginFailedFlag);

            if( !loginFailedFlag ) {
                // 使用新的 CloseableHttpClient
                CloseableHttpClient client2 = HttpClients.createDefault();

                // 執行get請求
                HttpGet httpGet = new HttpGet(testUrl);
                System.out.println("request line:" + httpGet.getRequestLine());
                CloseableHttpResponse httpResponse2 = client2.execute(httpGet, localContext);

                try {
                    printResponse(httpResponse2, cookieStore);
                } finally {
                    httpResponse2.close();
                    client2.close();
                }
            }

        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                // close HttpClient and release all system resources
                client.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

    private static String printResponse(HttpResponse httpResponse, CookieStore cookieStore)
            throws ParseException, IOException {
        HttpEntity entity = httpResponse.getEntity();
        // response status code
        System.out.println("status:" + httpResponse.getStatusLine());
        System.out.println("headers:");
        HeaderIterator iterator = httpResponse.headerIterator();
        while (iterator.hasNext()) {
            System.out.println("\t" + iterator.next());
        }

        System.out.println("cookies:");
        List<Cookie> cookies = cookieStore.getCookies();
        for (int i = 0; i < cookies.size(); i++) {
            System.out.println("\t" + cookies.get(i));
        }
        // 判斷 response entity 是否 null
        String responseString = null;
        if (entity != null) {
            responseString = EntityUtils.toString(entity);
            System.out.println("response length:" + responseString.length());
            System.out.println("response content:"
                    + responseString.replace("\r\n", ""));
        }

        return responseString;
    }

    private static List<NameValuePair> getParam(Map parameterMap) {
        List<NameValuePair> param = new ArrayList<NameValuePair>();
        Iterator it = parameterMap.entrySet().iterator();
        while (it.hasNext()) {
            Entry parmEntry = (Entry) it.next();
            param.add(new BasicNameValuePair((String) parmEntry.getKey(),
                    (String) parmEntry.getValue()));
        }
        return param;
    }
}
  • 使用 CookieStore

修改 TestHttpContext,利用既有的 cookieStore 產生新的 CloseableHttpClient: CloseableHttpClient client2 = HttpClients.custom().setDefaultCookieStore(cookieStore).build();

    if( !loginFailedFlag ) {
        // 以 cookieStore, 建立新的 CloseableHttpClient
        CloseableHttpClient client2 = HttpClients.custom()
                .setDefaultCookieStore(cookieStore).build();

        // 執行get請求
        HttpGet httpGet = new HttpGet(testUrl);
        System.out.println("request line:" + httpGet.getRequestLine());
        CloseableHttpResponse httpResponse2 = client2.execute(httpGet);

        try {
            printResponse(httpResponse2, cookieStore);
        } finally {
            httpResponse2.close();
            client2.close();
        }
    }
```

### HTTP Protocol Interceptors

可在處理 http message 時,加上一些特定的 Header,也可以在 outgoing message 中加上特別的 header,或是進行 content 壓縮/解壓縮,通常是用 "Decorator" pattern 實作的。

interceptor 可透過 context 共享資訊,例如在連續多個 request 中儲存 processing state。

protocol interceptor 必須要實作為 thread-safe,除非有將變數 synchronized,否則不要使用 instance variable。

public class TestHttpInterceptors {

public static void main(String[] args) {
    TestHttpInterceptors test = new TestHttpInterceptors();

    try {
        test.testInterceptors();

    } catch (Exception e) {
        e.printStackTrace();
    }

}

public void testInterceptors() throws IOException {
    final HttpClientContext httpClientContext = HttpClientContext.create();

    AtomicInteger count = new AtomicInteger(1);
    httpClientContext.setAttribute("Count", count);

    // request interceptor
    HttpRequestInterceptor httpRequestInterceptor = new HttpRequestInterceptor() {
        public void process(HttpRequest httpRequest, HttpContext httpContext) throws HttpException, IOException {
            AtomicInteger count = (AtomicInteger) httpContext.getAttribute("Count");

            httpRequest.addHeader("Count", String.valueOf(count.getAndIncrement()));
        }
    };

    // response handler
    ResponseHandler<String> responseHandler = new ResponseHandler<String>() {
        public String handleResponse(HttpResponse httpResponse) throws ClientProtocolException, IOException {

// HeaderIterator iterator = httpResponse.headerIterator(); // while (iterator.hasNext()) { // System.out.println("\t" + iterator.next()); // }

            HttpEntity entity = httpResponse.getEntity();
            if (entity != null) {
                return EntityUtils.toString(entity);
            }
            return null;
        }
    };

    final CloseableHttpClient httpClient = HttpClients
            .custom()
            .addInterceptorLast(httpRequestInterceptor)
            .build();

    final HttpGet httpget = new HttpGet("http://192.168.1.24/");

    for (int i = 0; i < 20; i++) {

        String result = httpClient.execute(httpget, responseHandler, httpClientContext);

// System.out.println(result); }

}

} ```

Exception Handling

HTTP Protocol processor 會產生兩種 Exceptions: java.io.IOException (socket timeout, socket reset) 及 HttpException (HTTP failure)。HttpClient 會 re-throw HttpException 為 ClientProtocolExcpetion (subclass of java.io.IOException),因此我們只需要 catch IOException,就可同時處理兩種錯誤狀況。

HTTP protocol 是一種簡單的 request/response protocol,沒有 transaction processing 的功能。預設 HttpClient 會自動由 I/O Exception 恢復。

  1. HttpClient不會嘗試從任何邏輯或HTTP協議錯誤中恢復(繼承自HttpException class)

  2. HttpClient將自動重試被認定的冪等方法

  3. HttpClient將自動重試當HTTP請求仍然在傳送到目標服務器,但卻失敗的方法(例如請求還沒有完全傳輸到服務器)

HttpRequestRetryHandler myRetryHandler = new HttpRequestRetryHandler(){
    public boolean retryRequest(IOException exception, int executionCouont, HttpContext context){
        if(executionCount >= 5){
            return false;
        }
        if(exception instanceof InterruptedIOException){
            return false;
        }
        if(exception instanceof UnknownHostException){
            return false;
        }
        if(exception instanceof ConnecTimeoutException){
            return false;
        }
        if(exception instanceof SSLException){
            return false;
        }
        HttpClientContext clientContext = HttpClientContext.adapt(context);
        HttpRequest request = clientContext.getRequest();
        boolean idmpotent - !(request instanceof HttpEntityEnclosingRequest);
        if(idempotent){
            return true;
        }
        return false;
    }
};
CloseableHttpClient httpclient = HttpClients.custom().setRetryHandler(myRetryHandler).build();

Aborting Requests

可以在執行的任何階段呼叫 HttpUriRequest#abort() 方法終止 request,提前終止該 request 並解除執行線程對I/O操作的阻塞。該方法是 thread-safe,可以從任何thread 呼叫該 method,如果HTTP請求終止,會拋出InterruptedIOException。

Redirect Handling

HttpClient自動處理所有類型的 Redirect,除了那些HTTP spec 要求必須用戶介入的狀況,POST 和 PUT 的 see Other(狀態code 303)重定向按HTTP規範的要求轉換成GET請求。可以自定義重定向策略覆蓋HTTP規範規定的方式.

LaxRedirectStrategy redirectStrategy = new LaxRedirectStrategy();
CloseableHttpClient httpclient = HttpClients.custom()
.setRedirectStrategy(redirectStrategy)
.build();

HttpClient經常需要在執行過程中重寫請求信息,默認的HTTP/1.0和HTTP/1.1通常使用相對請求URIs,原始的請求也可能從其他位置重定向多次,最終的絕對HTTP位置可使用原始的 request 和 context 獲得。URIUtils#resolve可以用來解釋絕對URI用於最終的 request,該方法包括重定向請求或原始請求的最後一個片段的 identifier。

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpClientContext context = HttpClientContext.create();
HttpGet httpget = new HttpGet();
CloseableHttpResponse response = httpclient.execute(httpget,context);
try{
    HttpPost target = context.getTargetHost();
    List<URI> redirectLocations = context.getRedirectLocations();
    URI location = URIUtils.resolve(httpget.getURI(),target,redirectLocations);
    System.out.println("Final HTTP location: " + location.toASCIIString());
} finally {
    response.close();
}

References

使用 httpclient 連接池及注意事項

HttpClient 4 Cookbook

Posting with HttpClient

HttpClient tutorial

HTTP context的使用

HttpClient4.x 使用cookie保持會話

Java爬蟲入門簡介(三)——HttpClient保存、使用Cookie請求

HttpClient獲取Cookie的一次踩坑實錄

Apache HttpClient 4.5 How to Get Server Certificates

2018年2月5日

Pacemaker & Corosync

Pacemaker 作為一個 cluster resource manager,負責處理多個 server node 旗下軟體的生命週期,他是透過 cluster services 監控及復原 node 的狀態,cluster service 提供 messaging 與 membership 管理機制,常見的 cluster service 有 corosync, cman及 heartbeat。

以往在處理cluster service 是用 heartbeat,但在 v3 以後,該專案拆分為多個部分,包含 Cluster Glue、Resource Agents、messaging layer(Heartbeat proper)、Local Resource Manager,以及 Cluster Reource Manager,而pacemaker 就是拆分出來的 resource manager,而新版的 heartbeat 只負責處理各 server node 之間的 messaging。

Pacemaker 主要功能包含

  1. server node 及 service 的故障檢測和恢復
  2. 多樣化的 storage,不需要 shared storage
  3. 多樣化的 resources,任何可以寫成 script 的服務都可以被 clustered
  4. 支援 fencing (STONITH),確保 data integrity
  5. 同時支持多種集群配置模式,規模大或小都可以
  6. 同時支援 quorate 以及 resource-driven 兩種 clusters
  7. 支援多種 redundancy configuration
  8. 自動化 replicated configuration,可由任意一個 node 更新 config
  9. 可指定 cluster-wide service ordering, colocation 及 anti-colocation
  10. 支援進階的 service types: (1) clones: 用在需要在多個 nodes 啟動的 services (2) multi-state: 用在 master/slave, primary/secondary
  11. unified, scriptable cluster management tools

STONITH: Shoot-The-Other-Node-In-The-Head 的縮寫,就是將發生問題的 node 關掉的功能,通常試用 remote power switch 來實現。

High-availability cluster: Node Configurations 中提到,最常見的兩個 server node 的 cluster 架構如下

如果架構牽涉到多個 nodes,則有下列的情況

  1. Active/Active

    要導向到 failed node 的 traffic,會轉送到其他 active nodes,這只能用在所有 nodes 都使用相同的 software configuration 的情況

  2. Active/Passive

    每個 node 都完整提供 redundant instance,備援節點只會在 primary node failed 時,切換為 online,這種架構需要增加 hardware

  3. N+1

    提供一個單一的 extra node,會在某個 node failed 時,接手該 node 的工作,切換為 on-line,每個 node 會有不同的 software configuration,該 extra node 要能夠替代其他 nodes 的配置。當 N 為 1,就等同於 Active/Passive 的架構。

  4. N+M

    如果這個 cluster 提供了多個 services,單一個 failover node 不敷使用,這時需要多個 standby nodes

  5. N-to-1

    可讓 failover node 暫時變為 active node,直到原本的 node 已經復原並 on-line,而服務會再切換回原本的 service node。

  6. N-to-N

    合併了 active/active 及 N+M 的概念,當發生 failed node,會將 traffic 導向到其他的 active nodes,不需要 standby node,但需要所有 active nodes 都有接手其他 nodes service 的能力。

  7. split-site

    多個機房的 clustering

note: OpenAIS 是對 Service Availability Forum 的AIS (Application Interface Specification) 的實作,包含了 node 管理, messaging, monitoring 等功能,但沒有 cluster resource manager 的功能,因此需要使用 pacemaker 或 rgmanager 作為 resource manager。Corosync Cluster Engine 就是由 OpenAIS 發展而來的。

Sample: Apache httpd Active-Passive cluster

以 vagrant 準備兩個 VM: web1, web2,再根據 How To Set Up an Apache Active-Passive Cluster Using Pacemaker on CentOS 7 的說明,測試設定 web1 及 web2 為 Apache httpd Active-Passive cluster 架構。

Vagrant.configure("2") do |config|
  config.vm.provision "shell", inline: "echo Hello"

  config.vm.define "web1" do |web1|
    web1.vm.box = "geerlingguy/centos7"
    web1.vm.hostname = "web1"

    web1.vm.network "private_network", ip: "192.168.0.100"
    web1.vm.network "public_network", ip: "192.168.1.24", bridge: "en0: 乙太網路", auto_config: false

    web1.vm.provision "shell",
        run: "always",
        inline: "route add default gw 192.168.1.1"
  end

  config.vm.define "web2" do |web2|
    web2.vm.box = "geerlingguy/centos7"
    web2.vm.hostname = "web2"

    web2.vm.network "private_network", ip: "192.168.0.200"
    web2.vm.network "public_network", ip: "192.168.1.25", bridge: "en0: 乙太網路", auto_config: false

    web2.vm.provision "shell",
        run: "always",
        inline: "route add default gw 192.168.1.1"
  end
end

編輯 /etc/hosts,分別讓兩台機器都能以 hostname 連接到對方

$ vi /etc/hosts

192.168.0.100       web1
192.168.0.200       web2

安裝 apache httpd

yum -y install httpd

修改 status page

$ vi /etc/httpd/conf.d/status.conf

<Location /server-status>
   SetHandler server-status
   Order Deny,Allow
   Deny from all
   Allow from 127.0.0.1
</Location>

分別在兩台機器,製作不同的首頁

$ cat <<-END > /var/www/html/index.html
<html>
<body>hello web1</body>
</html>

END
$ cat <<-END > /var/www/html/index.html
<html>
<body>hello web2</body>
</html>

END

安裝 pacemaker,安裝後會產生新的帳號 hacluster

yum -y install pacemaker pcs

systemctl enable pcsd.service
systemctl start pcsd.service

設定兩台機器相同的 hacluster 密碼

sudo passwd hacluster

設定 pacemaker

檢查 firewall status,如果沒有啟動,就啟動 firewalld

firewall-cmd --state

systemctl start firewalld.service

在 firewalld 新增一個 high-availability service

firewall-cmd --permanent --add-service=high-availability

# reload firewalld
firewall-cmd --reload

同時在兩台機器將 pacemaker 及 corosync 都設定為開機啟動

systemctl enable corosync.service
systemctl enable pacemaker.service

因為這兩台機器已經都安裝且設定了 pacemaker,接下來,我們只需要在其中一台機器設定 authentication

$ pcs cluster auth web1 web2
Username: hacluster
Password:
web2: Authorized
web1: Authorized

產生同步的 corosync 設定

$ sudo pcs cluster setup --name webcluster web1 web2

Destroying cluster on nodes: web1, web2...
web1: Stopping Cluster (pacemaker)...
web2: Stopping Cluster (pacemaker)...
web1: Successfully destroyed cluster
web2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'web1', 'web2'
web1: successful distribution of the file 'pacemaker_remote authkey'
web2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
web1: Succeeded
web2: Succeeded

Synchronizing pcsd certificates on nodes web1, web2...
web2: Success
web1: Success
Restarting pcsd on the nodes in order to reload the certificates...
web2: Success
web1: Success

接下來就可以看到,剛剛設定的 webcluster 已經寫入這個設定檔 /etc/corosync/corosync.conf

# more corosync.conf
totem {
    version: 2
    secauth: off
    cluster_name: webcluster
    transport: udpu
}

nodelist {
    node {
        ring0_addr: web1
        nodeid: 1
    }

    node {
        ring0_addr: web2
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

啟動 Cluster

pcs cluster start --all

檢查 cluster 狀態

# pcs status
Cluster name: webcluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: unknown
Current DC: NONE
Last updated: Mon Dec 18 07:39:34 2017
Last change: Mon Dec 18 07:39:20 2017 by hacluster via crmd on web2

2 nodes configured
0 resources configured

Node web1: UNCLEAN (offline)
Online: [ web2 ]

No resources


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Note: 發生 pacemaker node is UNCLEAN (offline) 的問題,這必須要修改 /etc/hosts

分別修改 /etc/hosts 將 127.0.0.1 web1 及 web2,這一行刪除,並重新啟動 corosync

#127.0.0.1  web1    web1
systemctl restart corosync.service

接下來就可以看到 pcs 正常的狀態

# pcs status
Cluster name: webcluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: web1 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Mon Dec 18 07:52:36 2017
Last change: Mon Dec 18 07:45:38 2017 by hacluster via crmd on web2

2 nodes configured
0 resources configured

Online: [ web1 web2 ]

No resources


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

在 pcs status 看到的 STONITH(Shoot-The-Other-Node-In-The-Head) warning,可以將 stonith 關閉解決

pcs property set stonith-enabled=false

在一半以上的 nodes online 時,cluster 會產生 quorum,Pacemaker 預設是在沒有 quorum 時,就會關閉所有 resources,因為現在是以兩台機器進行測試,因此要關閉 quorum 的功能。

pcs property set no-quorum-policy=ignore

設定 Virtual IP

pcs resource create Cluster_VIP ocf:heartbeat:IPaddr2 ip=192.168.1.26 cidr_netmask=24 op monitor interval=20s

查詢 ip addr,可發現目前 web1 有兩個 Public IPs: 192.168.1.24 及 192.168.1.26

# ip addr show

4: enp0s9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:c4:2b:2d brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.24/24 brd 192.168.1.255 scope global enp0s9
       valid_lft forever preferred_lft forever
    inet 192.168.1.26/24 brd 192.168.1.255 scope global secondary enp0s9
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fec4:2b2d/64 scope link
       valid_lft forever preferred_lft forever

# pcs status

.....

Full list of resources:

 Cluster_VIP    (ocf::heartbeat:IPaddr2):   Started web1

將 Apache httpd 加入 cluster resource,resource agent 為 ocf:heartbeat:apache

pcs resource create WebServer ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://127.0.0.1/server-status" op monitor interval=20s

要確保兩個 resource 運作在同一台機器有兩種方式

  1. 將 ClusterVIP 及 WebServer 綁定為同一個 resource group,並設定 ClusterVIP 啟動順序先於 WebServer
pcs resource group add WebGroup Cluster_VIP
pcs resource group add WebGroup WebServer

pcs constraint order start Cluster_VIP then start WebServer
  1. 設定 colocation constraint
pcs constraint colocation add WebServer Cluster_VIP INFINITY

測試 cluster,首先以 browser 瀏覽 Virtual IP 首頁http://192.168.1.26,畫面上會看到 hello web1

將 web1 關機

vagrant halt web1

Virtual IP 首頁http://192.168.1.26,畫面上會看到 hello web2

將 web1 啟動

vagrant up web1

這時還是維持在 web2,除非再把 web2 關機,服務就會回到 web1


如果希望盡量以 web1 為主,web2 為輔,當 web1 開機時,就使用 web1,必須要增加 location 限制,將 web1 的priority 調高。當 web1 offline 而 web2 online,如果 web1 online 了,網頁服務還是會回到 web1。

pcs constraint location WebServer prefers web1=50
pcs constraint location WebServer prefers web2=45

References

How To Create a High Availability Setup with Corosync, Pacemaker, and Floating IPs on Ubuntu 14.04

將 Heartbeat 換成 Pacemaker+Corosync

High Availability and Pacemaker 101!

Automating Failover with Corosync and Pacemaker

透過 PACEMAKER 來配置 REDHAT 6 HIGH AVAILABILITY ADD-ON

CentOS7 架設 RHCS (High-Availability Server)

Pacemaker + Corosync 做服務 HA

How To Set Up an Apache Active-Passive Cluster Using Pacemaker on CentOS 7

在 CentOS7/RHEL7 上,學習架設 High-Availability 服務(一)

corosync+pacemaker 高可用集群

Centos7之pacemaker高可用安裝配置詳解

Linux 高可用(HA)集群之Pacemaker詳解

高可用centos7 HA:corosync+packmaker+http\mysql

使用 Load Balancer,Corosync,Pacemaker 搭建 Linux 高可用集群

CentOS 7 で DRBD/Pacemaker/Corosync で High Availability NFS

在 CentOS 7 上使用 PaceMaker 構建 NFS HA 服務

Corosync+pacemaker+DRBD+mysql(mariadb)實現高可用(ha)的mysql集群(centos7)