Test Information Space

Journal of Tech, Testing and Trends

Archive for September, 2009

Instant Virma

Posted by cadsmith on September 27, 2009

“It prospered strangely, and did soon disperse
Through all the earth:
For they that taste it do rehearse
That virtue lies therein”
,

George Herbert, Peace, 1633.

In this case the wide-spread subject is virtualization. This makes a computer or storage system look like many to its users. Popularity is due to the costs and power saved by not having to load up on hardware, often to meet a temporary peak demand, and the agility in fielding appropriate infrastructure and applications. Sometimes it is as easy as drawing a capacity plan and having the hypervisor and virtual machine monitor assemble the hardware emulation software automatically on hosted servers, tuning each virtual machine (VM) instance’s portion of resources such as processor instruction cycles, memory or bandwidth for proper load balancing.

The techniques sprang from time-sharing, portable OSes, and redundant storage devices. Of course, hardware was also often developed using simulation and functionality implemented in firmware. Now the bare metal can host a layer which mimics popular processor, memory, I/O, and network switch architectures so that off-the-shelf applications can run anywhere, operating system optional, and migration is easier. This is offered for servers, desktops, phones and data centers. The approach spans cloud, grid, parallel and high-performance computing (HPC) systems. Vendors include VMWare, Microsoft, IBM, Intel, Oracle, Cisco and many others. There are open-source versions which lower cost further if vendor support is not necessary, e.g. Xen and KVM. Hardware may also have virtualization built in as a multiplier and for compatibility to a variety of interfaces, for instance.

System management is significant since integration issues are likely and software may require licensing. A virtual machine often has to reboot when a bug causes a crash, but the rest of the VMs run intact. Version changes introduce risk. Infrastructure patches cause side-effects to virtual apps. It is possible to mix various ratios of physical and virtual components. Performance may be adversely affected by additional virtualization layers. VM sprawl makes end-to-end administration more difficult. The visibility and testing tools need improvement. Standard quality measures can still be taken, such as use cases, architectural review, and measures of functionality, usability, security, scalability and performance. Benchmarking in VMs may have time drift.

Users, developers and administrators can expect to see this topic expand as more virtual appliances are developed. Here is an example introductory Glossary.

Also see bookmarks. Tags can be combined as subtopics, e.g. taxonomy or test. A sampling of additional literature on virtualization and grid computing is shown below.

Books:

Documents

Advertisements

Posted in Uncategorized | Tagged: , , , , | Leave a Comment »

The Stream of Scientific Revolutions

Posted by cadsmith on September 20, 2009

Due to social and sensor networks, it is estimated that data volume is doubling every 9 to 12 months. Analysis is required in realtime to derive knowledge from distributed databases. Awareness is improved by adding sources, e.g. the internet of things. The term data mining, reportedly coined by Robert Hecht-Nielsen a couple of decades ago, denotes automated fact-finding, knowledge discovery, rule inference and prediction activities. The field follows predecessors such as statistics, originally named for state demographics and economics, and machine learning. ACM dedicated a knowledge discovery and data mining group, KDD, in 1989.

In classic science, a hypothesis is often disproved by experiment, whereas in this case, tests yield a data-driven hypothesis. Patterns of interest are useful or novel, though most are not. More recently, the field picked up steam as analysis times for huge databases became excessive, disparate sources needed to be quickly connected and dimensionality, or number of attributes, expanded. These result in ways to assign meaning which leads to knowledge which is communicated as information assuming that errors are avoided or corrected. The result is better visualization and built-in database intelligence.

Government and security have been major proponents, e.g. for profiling. Other applications include biomed, insurance, physics, business intelligence, CRM, information retrieval, OLAP online analytical processing, text mining and analysis, finding experts, sports stats, and digital libraries. Besides software, tools include decision trees and neural networks. Models may be verified by splitting the data and verifying the equivalence of results on both parts.

Major tasks have been outlined as:

  • classification, sequence detection, genetic algorithms, nearest neighbor, naive bayes classifier, logistic regression and discriminant analysis;
  • affinity analysis, market basket, association analysis, rule learning, rough sets, and sequence detection;
  • prediction, regression, and time series analysis forecasting;
  • segmentation, cluster analysis, and kohonen networks;

A couple of the popular standards are CRISP-DM, cross industry standard process for data mining, and PMML, predictive model markup language.

There is plenty of software such as R, SAS SEMMA for sample explore modify model assess, SPSS, Netbase, Statistica, opensource Labkey, Rattle GNOME GUI, GNU octave, Weka-3, Apache Hadoop, Datalogic/R, Mozenda scraper. IBM, Oracle and Microsoft have offerings.

Other than usability, system integration and projections from prior knowledge, issues commonly revolve around privacy and performance. Congress has discussed consumer protections, though users are tracked from an increasing number of government social-network sites and cloud security standards are still in development. Data may be missing. Patterns may not be understandable. Noisy data can result in spurious patterns, though source correction is improving. Relationships between fields may be more complex than assumed.

Kdnuggets is a general resource site.

Also see bookmarks for media links.

Posted in Uncategorized | Tagged: , , , | 1 Comment »

Dili-ology

Posted by cadsmith on September 13, 2009

090912

To a digital library, all readers are scholars. At least users develop smoother knowledge interaction skills. There is no foraging in the stacks and waiting in line at the tall desk for someone to do book checkouts, indicate what’s relevant or where it might be hidden. New ideas can be discovered through links between material. Excerpt streams raise interest. Like testers, readers can get to original data from which conclusions are drawn, e.g. electronic records. Librarians or teachers may leverage a set of reader proxies to serve wider audiences. Filters deliver timely and flawless content to each customer. A miniature copy of the world’s information can also be carried on person for those who prefer an old school approach.

As in heavy metal, paradise or perdition is subjectively layered. This entry’s title might describe devotees of a cousin to the previously discussed Hali world of immersion or augmentation for seekers of the perfect portal, lens or cave. Those who prefer fiction may find a wider range of levels. some noticing hints of a precursor to singularity where Q&A email or messaging is a type of Turing test for content analysis and sources.

What are some examples of a digital library? They’re more than just notes and quotes:

  • WDL has an international museum collection that features a time slider to change the date range from 8000BC to present.
  • Google books has public literature.
  • Papercube visualizes the domain of academic papers.
  • DigitalGlobe offers high-resolution geospatial imagery for sale.

In addition to concepts, digital library artifacts cover books, documents, podcasts, music and video recordings, art, news, databases, software, taskflows and messages. Digital rights management (DRM) prevents fraudulence and can limit the number of simultaneous copies, whole or partial, where necessary for payment. Private libraries can be implemented, e.g. for educational exploration. A digital library can also be embedded in the web and vice-versa. This becomes interesting when one considers that present web search engines or wikis already offer language localization and translation, web2.0 has bookmarks, annotations, reviews, rankings, recommendations, search wikis, creative commons and mashups, and semantic web has taxonomies, ontologies, datamining and linked-data.

Library classifications include at least the size of collection, purpose, users, implementation, features, interaction, media types, and errata or known issues, e.g. structural or tested. User roles encompass readers, authors, librarians, publishers, artists, and critics. Faculty and students are also contained in this set. Purposes have not been exhausted, but so far have comprised cultural archives, research, documentation, academic or learning management system (LMS), business and personal entertainment. Implementations span the gamut of IT from software, to internet or cloud, and devices including mobile ereaders, laptops, phones and netbooks.

A key feature is digitization of data, metadata and processes. An example is books scanned into storage. Accessibility aids may convert these to another language, large print, audio or braille. Details and topics are indexed for multimedia browsing or search guides. Details can be summarized for outlines. Data calculations can be performed. Answers can be derived upon request or realtime alerts can be sent to interested parties when relevant information appears.

Speed reading techniques can be adapted. Rapid skimming loads preconcious representations (though may trigger site autodownload blockers in extreme cases). Mnemonics can be filled in, e.g. by repetition, outlining, and cues. Historically this involved  poetry and pictures. Now there are also hyperlinks, tag clouds, and storyboards.

Issues are legion. The digital divide needs to be conquered. Copyright involves negotiation as demonstrated by the book rights registry resulting from the Google settlement. There are tradeoffs involving cultural identities when collections merge, e.g. the library of (party) congress. Censorship is an ancient barrier in modern guise. There are need-to-know limitations for safety or security. Sponsors may have agendas. Some material may not be digitized. Where media is cultural memory, unrepresented items cease to have ever existed which affects government legislation based upon official research. Misinformation techniques exist for revisionism, tampering or spoofing. Surveillance can be excessively pervasive, e.g. reading lists used to label (literary) agents. Datawarehousing concerns apply, e.g. synching copies to sources. Users have to distinguish between appropriateness of specialized and general-purpose devices. We can further evaluate qualities such as preservation, usability, findability, accessibility, performance, scalability, quality of service, interoperability and sustainability.

Also see bookmarks.

Image: Buddha‘s Kindle.

Posted in Uncategorized | Tagged: , , , , | Leave a Comment »

Achilles Heals

Posted by cadsmith on September 6, 2009

Structural

Doctors have discovered a cure for illegible handwriting. All medical data is becoming digital. Real-time patient data is included in hospitals. Patients own the data. It can be accessed anywhere/anytime by logging onto a portal.

Besides security, there are issues analogous to celebrity athletes encouraged to undergo strictly private tests only to have their results published in the press. Patient commentary outlets are yet to be defined.

The US government has earmarked at least a couple of billion dollars for IT infrastructure to spread across clinics. Broadband allows online access at the highest possible bandwidths. An electronic medical record (EMR) is generated by a treatment organization. An electronic health record (EHR) is the set of EMRs from all places. Advantages are compliance, efficiency, access, reporting, coding, and quality. These are required to support meaningful use features including computerized order entry, drug interaction checking, maintaining an updated problem list, and generation of transmissible prescriptions. Interoperability of Health Information Exchange (HIE) is a major theme. The criteria are expected to become more complex in 2013.

The new gear changes procedures and vice-versa. Not as many filing clerks are needed. Templates and quality reports are broadly well-defined. Database use schema, or at least have formats to allow vendors to translate to eachother.¬† Redundant tests are less necessary since a patient’s complete history and status are known. Loss of data is an issue. Records need to be kept for 7 to 21 years depending upon local regulations so previous paperwork is still saved. Thorough consistent quality checks are essential. Medical device testing is rigorous, e.g. FDA clinical trials, so medical data is expected to have regulatory monitoring. Open data standards are required.

Early adopters have already started to avoid an expected EMR backlog. The transition is gradual since this new type of productivity effort needs acclimation. It requires a project plan to complete, and ongoing management in addition to tech support. New patients are easier to add since previous hardcopy records do not need to be input. Insurance providers, Medicare and Medicaid offer incentives by rewarding 44k or more after successful implementation to lower insurance premiums. Financing loans are available. Preliminary certification begins in October. 20% of transition attempts have been unsuccessful due to a variety of causes such as functional, technical, integration, incompatibility, usability, dysfunctional, or expense. Transition times are expected to be reduced as procedures are debugged. Clinics which grow in size may need to change EMR providers.

The data types are familiar, e.g. accepted standards such as HL7, XML and export to PDF. Imaging devices generate more graphics, e.g. CT and MRI. Record storage is networked, locally and on the internet. Snapshots are kept on USB, smartcard, bracelet and, in some cases, implant. Realtime data is significant in the ER and OR and streamed from patient monitors elsewhere, e.g.  wirelessly transmitting vitals using 802.11. Drawing conclusions from the database may require filling in blanks or adding more resolution.

Private clinics are growing in number. Hospital IT is simultaneously adding more sophisticated management and research systems, e.g. Microsoft Amalga. EMR implementations are going on internationally, e.g. Taiwan. Pervasive medical surveillance is part of larger efforts. The world’s biggest democracy, India, is requiring national ID cards.

Also see wiki topic.

Image: Structural MRI

Posted in Uncategorized | Tagged: , , , , | 1 Comment »